TokenPak¶
Zero-token operations. Maximum context efficiency.
TokenPak is an open-source LLM proxy that compresses context, routes requests intelligently, and tracks costs — all without touching your prompts or credentials.
Why TokenPak?¶
LLM APIs charge per token. Most conversations are bloated with repetitive context, verbose code comments, and redundant structure. TokenPak fixes that at the proxy layer — transparently, locally, without ever seeing your content.
| Metric | Value |
|---|---|
| Average token reduction | 43–84% |
| Zero-token operations | 80%+ |
| Cold start overhead | < 100ms |
| Indexing throughput | 2,700+ files/sec |
Core Principles¶
We never see your prompts, code, or responses. Everything happens locally.
Pure passthrough proxy — your API keys go directly to providers, never stored by TokenPak.
Downgrade anytime. Keep all your data. No vendor dependencies.
Status, search, cost reports — all free. CLI-first, deterministic.
Quick Start¶
Then point your LLM client at http://localhost:8766. That's it. See Getting Started for the full walkthrough.
What's Inside¶
-
:material-fast-forward: Getting Started
Install TokenPak and run your first compressed request in 5 minutes.
-
:material-console: CLI Reference
Every command, every flag, with examples.
-
:material-lan: Proxy Setup
Connect Claude Code, OpenAI clients, or any HTTP-based LLM tool.
-
:material-chef-hat: Recipe Development
Build custom compression recipes for your domain.
-
:material-chart-bar: Telemetry & Dashboard
Track costs, view savings, export reports.
-
:material-server: Team Server
Deploy a shared TokenPak instance for your whole team.