Feature Matrix¶

All features are FREE and open source under the Apache 2.0 license.

Core Features¶

Category	Feature	Status	Notes
Core Routing	Multiple provider adapters	✅	Built-in adapters: Anthropic, OpenAI, Google, xAI/Grok, Ollama
	Fallback chains	✅	Auto-failover to backup providers
	Circuit breaker	✅	Recovers from rate limits
Token Management	Token counting (all providers)	✅	Unified across Anthropic, OpenAI, Google
	Cost tracking	✅	Basic tracking + reporting
Compression	Deduplication	✅	Remove repeated content
	Document compression	✅	Summarize long docs
	Instruction table	✅	Compress repetitive instructions
Error Handling	Normalized error messages	✅	Consistent across providers
	Automatic retries	✅	Exponential backoff, configurable
	Error telemetry	✅	Log error types and frequency
Observability	Request/response logging	✅	JSON logs, searchable
	Token usage reports	✅	CSV export, JSON export
Agentic	Error normalization	✅	Convert errors to agent-readable format
	Streaming support	✅	Handle streaming + non-streaming
Vault Integration	Document indexing	✅	Index local files (.md, .txt, .pdf)
	Semantic search	✅	Search vault by meaning
	Auto-injection	✅	Automatically add relevant docs to context
	Symbol extraction	✅	Extract functions, classes, variables
	AST parsing	✅	Parse code structure
	Chunk optimization	✅	Smart chunking for injection
	Watcher mode	✅	Live re-index on file changes
CLI	`serve` command	✅	Start the proxy
	`preview` command	✅	Dry-run compression on a file (shows token savings)
	`benchmark` command	✅	Test compression on a document
	`validate` command	✅	Validate a TokenPak JSON file against the schema
	`report` command	✅	Generate usage reports

Feature Details¶

Core Proxy¶

Provider routing, adapters, tool schema handling, fallback chains, circuit breaker, streaming, passthrough.

TokenPak's primary usage model is to run the local proxy and point your existing provider SDK or tool at it via a base URL — no application code changes required:

# Start the proxy (default: http://127.0.0.1:8766)
tokenpak serve

# Use the standard Anthropic SDK, routed through TokenPak
import anthropic

client = anthropic.Anthropic(
    base_url="http://127.0.0.1:8766",
    api_key="sk-ant-...",
)
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
)

Token & Cost Tracking¶

TokenPak tracks per-request token usage and savings server-side. Inspect them over HTTP or via the CLI:

curl http://127.0.0.1:8766/stats/last    # most recent request
tokenpak stats                           # registry totals
tokenpak savings                         # tokens and cost saved

Compression¶

Deduplication, doc compression, instruction table, budget tracking, fidelity tiers

Automatically applied. Semantic equivalence guaranteed.

Error Handling¶

Normalized errors, automatic retries with exponential backoff, and circuit breaking are applied transparently by the proxy. When an upstream provider fails repeatedly, the proxy opens a circuit breaker (visible at GET /circuit-breakers) and surfaces a consistent JSON error to the client. See the Error Handling Guide.

Vault Features¶

Indexing, search, auto-injection, symbol extraction, AST parsing, chunking, watcher, SQLite backend

vault:
  enabled: true
  root: "~/my-vault"
  auto_inject: true  # Automatically add relevant docs

Installation & Usage¶

pip install tokenpak
tokenpak serve

Configuration Reference¶

Basic config.yaml¶

proxy:
  port: 8766
  host: 127.0.0.1

provider: anthropic
fallback:
  - google
  - openai

compression:
  enabled: true

telemetry:
  enabled: true
  log_file: /tmp/tokenpak.log

vault:
  enabled: true
  root: ~/my-vault

Feature Roadmap¶

These items are on the roadmap and are not part of the current OSS beta surface — they are directions, not commitments. For exactly what ships in the beta today, see the feature overview.

Multi-turn conversation history management
Vision / multimodal support
Advanced batch processing

TokenPak already ships deterministic Prompt Packing today (see the feature overview); it interoperates with provider-side prompt caching rather than replacing it.

Support & Licensing¶

License: Apache 2.0. Use however you like.

See README for more information.