Skip to content

TokenPak

The intelligent LLM proxy for context compression and vault injection.

TokenPak sits between your AI tools and the upstream LLM API, automatically compressing context, injecting vault knowledge, and caching tokens to slash costs and latency.


✨ Key Features

  • Context compression — Up to 40% token reduction on large payloads
  • Vault injection — Automatically enrich requests with relevant knowledge
  • Token caching — Reduce repeat API costs with smart cache hits
  • Drop-in proxy — Works with any OpenAI-compatible client
  • Multi-provider — Routes to Anthropic, OpenAI, Gemini, and more

🚀 Quick Start

pip install tokenpak
tokenpak start --port 8766
# Point your client to http://localhost:8766

Full installation guide
Quick start guide


📚 Documentation

Section Description
Getting Started Install and configure TokenPak
Configuration All configuration options
CLI Reference Command-line interface
Architecture How TokenPak works
FAQ Common questions