Skip to content

TokenPak

A local proxy that compresses your LLM context before it hits the API — fewer tokens, lower cost, same results.

TokenPak sits between your AI tools and the upstream LLM provider, running entirely on 127.0.0.1. It deterministically packages context (Prompt Packing), routes requests, blocks runaway spend before it hits the wire (Spend Guard), and logs every saving locally. No cloud, no credentials stored, no code changes.

OSS beta

These docs describe the OSS beta of TokenPak (pip install tokenpak, currently v1.9.3, Apache 2.0). Anything not listed here is not part of the beta surface. See Known Issues for current limitations.


What ships in the OSS beta

  • Prompt Packing pipeline — deterministic context reduction on real agent workloads; reduction pinned to an agent-style CI fixture (reproduce with make benchmark-headline); provider-cached flows show lower incremental gains. Measure your own with tokenpak savings.
  • Local proxy on 127.0.0.1 — byte-preserved passthrough; your prompts and credentials never leave your machine.
  • Spend Guard — pre-send circuit breaker with rolling caps; blocks runaway requests before they reach the provider and returns a clear release directive.
  • Nine client integrations — Claude Code, Cursor, Cline, Continue, Aider, Codex CLI, Gemini CLI, OpenAI SDK, Anthropic SDK.
  • Savings Ledger + local dashboard — every request logged to a local SQLite store with causal attribution; TUI + web dashboard.
  • Vault indexing + semantic search — index your codebase, search without an LLM call.
  • TIP-1.0 protocol contracts — canonical headers, metadata fields, capability labels, manifest schemas. Conformance gate runnable via tokenpak doctor --conformance.
  • Pak recall (read-only) — storage, FTS, tokenpak pak inspect. Scoring and assembly are not part of the OSS beta.
  • 50 built-in compression profiles — YAML, customizable.

Quick start

pip install tokenpak
tokenpak setup
# Then point your client at http://127.0.0.1:8766

Full installation guide5-minute Quick Start


Documentation map

Section What it covers
Installation pip install, system requirements, first run
Quick Start Setup wizard, client integration, first savings in 5 minutes
Configuration How configuration works (env vars + YAML, precedence)
Environment Variables Complete TOKENPAK_* reference
CLI Reference Every verb, flag, and exit code (auto-generated)
Architecture Three planes, modular subsystems, proxy-centered design
Savings How TokenPak attributes savings causally
Security Auth tokens, TLS, audit logging, data privacy
Troubleshooting Common symptoms and fixes that work
Known Issues Current OSS-beta limitations
FAQ General questions
Recall overview Paks, reason codes, risk flags — the OSS data plane
Client Guides Per-client integration walkthroughs (Claude Code, Cursor, Cline, Continue, Aider, Codex CLI, Gemini CLI, OpenAI/Anthropic SDK)

Source + package