Skip to content

TokenPak

Zero-token operations. Maximum context efficiency.

TokenPak is an open-source LLM proxy that compresses context, routes requests intelligently, and tracks costs — all without touching your prompts or credentials.

Python 3.11+ License: MIT


Why TokenPak?

LLM APIs charge per token. Most conversations are bloated with repetitive context, verbose code comments, and redundant structure. TokenPak fixes that at the proxy layer — transparently, locally, without ever seeing your content.

Metric Value
Average token reduction 43–84%
Zero-token operations 80%+
Cold start overhead < 100ms
Indexing throughput 2,700+ files/sec

Core Principles

We never see your prompts, code, or responses. Everything happens locally.

Pure passthrough proxy — your API keys go directly to providers, never stored by TokenPak.

Downgrade anytime. Keep all your data. No vendor dependencies.

Status, search, cost reports — all free. CLI-first, deterministic.


Quick Start

pip install tokenpak
tokenpak serve --port 8766

Then point your LLM client at http://localhost:8766. That's it. See Getting Started for the full walkthrough.


What's Inside

  • :material-fast-forward: Getting Started

    Install TokenPak and run your first compressed request in 5 minutes.

  • :material-console: CLI Reference

    Every command, every flag, with examples.

  • :material-lan: Proxy Setup

    Connect Claude Code, OpenAI clients, or any HTTP-based LLM tool.

  • :material-chef-hat: Recipe Development

    Build custom compression recipes for your domain.

  • :material-chart-bar: Telemetry & Dashboard

    Track costs, view savings, export reports.

  • :material-server: Team Server

    Deploy a shared TokenPak instance for your whole team.