22 results for “topic:prompt-compression”
🦞 LLM Token Compression & Reduction Tool — Cut AI agent token costs by up to 97%. 6-layer deterministic context compression for AI agent workspaces. No LLM required. Prompt compression, context window optimization & cost reduction for any LLM pipeline.
JavaScript/TypeScript implementation of LLMLingua-2 (Experimental)
Python command-line tool for interacting with AI models through the OpenRouter API/Cloudflare AI Gateway, or local self-hosted Ollama. Optionally support Microsoft LLMLingua prompt token compression
Rolling context compression for Claude Code — never hit the context wall. Auto-compresses old messages while keeping recent context verbatim. Zero config, zero latency. Works as a Claude Code plugin.
CUTIA: compress prompts while preserving quality
This repository is the official implementation of Generative Context Distillation.
TOON for TYPO3 — a compact, human-readable, and token-efficient data format for AI prompts & LLM contexts. Perfect for ChatGPT, Gemini, Claude, Mistral, and OpenAI integrations (JSON ⇄ TOON).
API gateway for LLM prompt compression with policy enforcement built on LLMLingua. Demonstrates cost control, prompt safety, and LLM execution boundaries.
LLMLingua-2 prompt compression hook for Claude Code — cut token usage by ~55%
Compress LLM Prompts and save 80%+ on GPT-4 in Python
Enhance the performance and cost-efficiency of large-scale Retrieval Augmented Generation (RAG) applications. Learn to integrate vector search with traditional database operations and apply techniques like prefiltering, postfiltering, projection, and prompt compression.
End-to-End Python implementation of CompactPrompt (Choi et al., 2025): a unified pipeline for LLM prompt and data compression. Features modular compression pipeline with dependency-driven phrase pruning, reversible n-gram encoding, K-means quantization, and embedding-based exemplar selection. Achieves 2-4x token reduction while preserving accuracy.
A fast, Unix-style CLI tool for semantic prompt compression. Cuts LLM prompt tokens by 10-20x with >90% fidelity, saving costs and latency.
This repository contains the code and data of the paper titled "FrugalPrompt: Reducing Contextual Overhead in Large Language Models via Token Attribution."
LLM context compression proxy — 40-70% token savings, zero code changes
LLM cost monitoring and optimization toolkit
RL-Prompt-Compression employs graph-enhanced reinforcement learning with a Phi-3 compressor trained via GRPO using a TinyLlama evaluator and a MiniLM cross-encoder feedback model, to optimize prompt compression and improve model efficiency.
CATALYST - Lightning-fast optimization plugin for Claude Code + Ollama. Achieves 3-4x speedup through intelligent prompt compression, smart caching, and task-aware planning. Zero dependencies, MIT licensed, production-ready.
PAKT: Lossless prompt compression for LLMs. 30-50% fewer tokens on JSON/YAML/CSV/Markdown. Perfect round-trip fidelity. TypeScript library + CLI + Chrome extension + Tauri desktop app.
Prompt compaction and shorthand codec for LLM workflows
No description provided.
A compact chrome extension built to cut down prompt size and maximize token savings. It compresses text while preserving meaning, helping you use fewer tokens and lower costs