TokenPak¶

Zero-token operations. Maximum context efficiency.

TokenPak is an open-source LLM proxy that compresses context, routes requests intelligently, and tracks costs — all without touching your prompts or credentials.

Why TokenPak?¶

LLM APIs charge per token. Most conversations are bloated with repetitive context, verbose code comments, and redundant structure. TokenPak fixes that at the proxy layer — transparently, locally, without ever seeing your content.

Metric	Value
Average token reduction	43–84%
Zero-token operations	80%+
Cold start overhead	< 100ms
Indexing throughput	2,700+ files/sec

Core Principles¶

Zero DataZero CredentialsZero Lock-inZero Tokens for Ops

We never see your prompts, code, or responses. Everything happens locally.

Pure passthrough proxy — your API keys go directly to providers, never stored by TokenPak.

Downgrade anytime. Keep all your data. No vendor dependencies.

Status, search, cost reports — all free. CLI-first, deterministic.

Quick Start¶

pip install tokenpak
tokenpak serve --port 8766

Then point your LLM client at http://localhost:8766. That's it. See Getting Started for the full walkthrough.

What's Inside¶

:material-fast-forward: Getting Started

Install TokenPak and run your first compressed request in 5 minutes.
:material-console: CLI Reference

Every command, every flag, with examples.
:material-lan: Proxy Setup

Connect Claude Code, OpenAI clients, or any HTTP-based LLM tool.
:material-chef-hat: Recipe Development

Build custom compression recipes for your domain.
:material-chart-bar: Telemetry & Dashboard

Track costs, view savings, export reports.
:material-server: Team Server

Deploy a shared TokenPak instance for your whole team.