TokenPak API Reference¶

Complete reference for the TokenPak proxy HTTP API, SDK adapters, and CLI.

Table of Contents¶

Proxy HTTP API
Authentication
GET Endpoints
POST Endpoints
Error Format
SDK Adapters
Base Adapter (TokenPakAdapter)
AnthropicAdapter
OpenAIAdapter
LangChainAdapter
LiteLLMAdapter
Exception Hierarchy
CLI Commands
Proxy Lifecycle
Indexing & Search
Monitoring & Stats
Diagnostics
Config Management
Advanced Commands
Configuration Reference
Environment Variables
config.yaml

Proxy HTTP API¶

The TokenPak proxy runs on localhost:8766 by default. It accepts standard HTTP requests and transparently forwards them to upstream providers after applying compression and context injection.

Authentication¶

By default, TokenPak allows unauthenticated requests from localhost. For remote clients, authentication is required via header:

Header	Value	Notes
`X-TokenPak-Key`	`<your-proxy-key>`	Required for non-localhost clients
`x-api-key`	`<provider-api-key>`	Provider key, forwarded to upstream
`Authorization`	`Bearer <token>`	Alternative to `x-api-key`

Requests from non-localhost without X-TokenPak-Key receive 401 Unauthorized.

GET Endpoints¶

`GET /`¶

Welcome / status endpoint. Returns proxy identity and available endpoints.

Response:

{
  "name": "TokenPak",
  "version": "0.5.0",
  "status": "running",
  "endpoints": {
    "health": "/health",
    "stats": "/stats",
    "docs": "/docs",
    "proxy": "/v1/messages (POST), /v1/chat/completions (POST)"
  },
  "docs": "https://github.com/kaywhy331/tokenpak"
}

`GET /health`¶

Lightweight health check. Cached for 1 second to reduce overhead.

Response:

{
  "status": "ok",
  "compilation_mode": "hybrid",
  "vault_index": {
    "available": true,
    "blocks": 42,
    "path": "/home/user/vault/.tokenpak"
  },
  "router": {
    "enabled": true,
    "rules_loaded": 5
  },
  "capsule_available": false,
  "budget": {
    "enabled": true,
    "total_tokens": 4000
  },
  "circuit_breakers": {
    "anthropic": { "open": false, "failures": 0 }
  },
  "stats": {
    "requests": 142,
    "input_tokens": 380000,
    "sent_input_tokens": 210000,
    "saved_tokens": 170000,
    "errors": 2,
    "cache_hits": 37,
    "cost": 0.85
  },
  "latency": {
    "p50_latency_ms": 320,
    "p99_latency_ms": 1840,
    "samples": 100
  }
}

Also supports HEAD /health (returns 200 with no body — useful for Kubernetes liveness probes).

`GET /stats`¶

Full session statistics. Heavier than /health — includes per-model breakdown and recent requests.

Response:

{
  "session": {
    "requests": 142,
    "input_tokens": 380000,
    "sent_input_tokens": 210000,
    "saved_tokens": 170000,
    "output_tokens": 95000,
    "cost": 0.85,
    "cost_saved": 0.42,
    "start_time": 1711584000.0,
    "errors": 2,
    "cache_hits": 37
  },
  "compilation_mode": "hybrid",
  "vault_index": {
    "available": true,
    "blocks": 42
  },
  "router": { "enabled": true },
  "today": { ... },
  "by_model": {
    "claude-sonnet-4-6": {
      "requests": 100,
      "input_tokens": 250000,
      "cost": 0.60
    }
  },
  "recent": [ ... ]
}

`GET /stats/last`¶

Per-request stats for the most recent proxied request.

Response:

{
  "request_id": "req_abc123",
  "timestamp": "2026-03-28T16:00:00Z",
  "model": "claude-sonnet-4-6",
  "tokens_saved": 1240,
  "percent_saved": 28.3,
  "cost_saved": 0.0037,
  "session_total_saved": 0.42,
  "session_requests": 142,
  "input_tokens_raw": 4380,
  "input_tokens_sent": 3140,
  "output_tokens": 512
}

Error (no requests yet):

{
  "error": "no_requests",
  "message": "No requests captured yet. Send a message to see stats."
}

`GET /stats/session`¶

Session aggregate summary with uptime and average savings.

Response:

{
  "session_requests": 142,
  "session_total_saved": 0.42,
  "tokens_saved": 170000,
  "tokens_sent": 210000,
  "tokens_raw": 380000,
  "output_tokens": 95000,
  "total_cost": 0.85,
  "uptime_hours": 4.5,
  "errors": 2,
  "avg_savings_pct": 44.7
}

`GET /savings[?since=<ISO-date>]`¶

Savings report, optionally filtered by start date.

Query Parameters:

Parameter	Type	Description
`since`	ISO-8601 date string	Filter to requests on/after this date

Example: GET /savings?since=2026-03-01

`GET /cache-stats`¶

Detailed cache hit/miss breakdown.

`GET /recent`¶

Last 50 requests with per-request metadata.

Response:

{
  "recent": [
    {
      "timestamp": "2026-03-28T16:00:00Z",
      "model": "claude-sonnet-4-6",
      "input_tokens": 4380,
      "output_tokens": 512,
      "latency_ms": 320,
      "status_code": 200,
      "tokens_saved": 1240
    }
  ]
}

`GET /trace/last`¶

Full pipeline trace for the most recent request (debugging).

Response:

{
  "request_id": "req_abc123",
  "timestamp": "2026-03-28T16:00:00Z",
  "stages": [
    { "name": "compaction", "duration_ms": 45, "tokens_before": 4380, "tokens_after": 3140 },
    { "name": "vault_inject", "duration_ms": 12, "blocks_injected": 2 },
    { "name": "upstream_forward", "duration_ms": 260, "provider": "anthropic" }
  ]
}

`GET /trace/<request_id>`¶

Pipeline trace for a specific request by ID.

`GET /traces`¶

All stored pipeline traces (up to last N requests).

Response:

{
  "traces": [ ... ],
  "count": 10
}

`GET /vault`¶

Vault index debug info — lists all indexed blocks.

Response:

{
  "available": true,
  "blocks": 42,
  "total_tokens": 185000,
  "path": "/home/user/vault/.tokenpak",
  "block_list": [
    {
      "block_id": "vault_001",
      "source_path": "04_KNOWLEDGE/concepts/tokenpak.md",
      "risk_class": "safe",
      "raw_tokens": 1240
    }
  ]
}

`GET /metrics`¶

Prometheus-compatible metrics in text format.

Content-Type: text/plain; version=0.0.4; charset=utf-8

Example output:

# HELP tokenpak_requests_total Total proxied requests
# TYPE tokenpak_requests_total counter
tokenpak_requests_total 142
tokenpak_tokens_input_total 380000
tokenpak_tokens_saved_total 170000
tokenpak_errors_total 2
tokenpak_uptime_seconds 16200

`GET /metrics/dashboard`¶

Comprehensive dashboard metrics with 8 key metrics in JSON format.

Response:

{
  "timestamp": "2026-03-28T16:00:00Z",
  "uptime_seconds": 16200,
  "requests": {
    "total": 142,
    "throughput_req_per_sec": 0.009,
    "24h_window": true
  },
  "latency": {
    "p50_ms": 320.0,
    "p95_ms": 980.0,
    "p99_ms": 1840.0,
    "avg_ms": 415.0,
    "samples": 100
  },
  "models": {
    "claude-sonnet-4-6": { "requests": 100, "input_tokens": 250000, "cost": 0.60 }
  },
  "routing": { "smart_routing_hit_rate": 0.0 },
  "cache": {
    "hit_ratio": 0.42,
    "read_tokens": 85000,
    "creation_tokens": 118000
  },
  "errors": {
    "error_rate": 0.014,
    "error_count": 2,
    "top_failures": { "429": 1, "503": 1 }
  },
  "streaming": { "count": 0, "percentage": 0.0 },
  "window_24h": {
    "input_tokens": 380000,
    "output_tokens": 95000,
    "total_cost": 0.85
  }
}

`GET /dashboard` / `GET /dashboard/<path>`¶

Serves the built-in HTML monitoring dashboard.

`GET /docs` / `GET /docs/`¶

Serves the built-in API documentation page (HTML).

`GET /openapi.yaml`¶

OpenAPI 3.0 spec for the proxy HTTP API.

POST Endpoints¶

`POST /v1/messages`¶

Anthropic Messages API — the primary proxy path for Claude models.

TokenPak intercepts this request, applies compression, and forwards to the upstream Anthropic API. The response is transparently passed back.

Headers:

Header	Value	Required
`Content-Type`	`application/json`	Yes
`x-api-key`	`<anthropic-api-key>`	Yes
`anthropic-version`	`2023-06-01`	Recommended

Request Body:

{
  "model": "claude-sonnet-4-6",
  "max_tokens": 4096,
  "messages": [
    {
      "role": "user",
      "content": "Explain quantum entanglement."
    }
  ],
  "system": "You are a helpful physics tutor.",
  "stream": false
}

Request Parameters:

Parameter	Type	Required	Description
`model`	string	Yes	Model ID (e.g. `claude-sonnet-4-6`)
`messages`	array	Yes	Conversation history — `role` + `content` pairs
`max_tokens`	integer	Yes	Maximum tokens in the response
`system`	string	No	System prompt
`stream`	boolean	No	Enable SSE streaming (default: false)
`temperature`	float	No	Sampling temperature (0.0–1.0)
`top_p`	float	No	Nucleus sampling threshold
`stop_sequences`	array	No	Custom stop strings
`tools`	array	No	Tool/function definitions
`tool_choice`	object	No	Tool selection policy

Response:

{
  "id": "msg_abc123",
  "type": "message",
  "role": "assistant",
  "content": [
    { "type": "text", "text": "Quantum entanglement is..." }
  ],
  "model": "claude-sonnet-4-6",
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 3140,
    "output_tokens": 512,
    "cache_read_input_tokens": 1200,
    "cache_creation_input_tokens": 800
  }
}

`POST /v1/chat/completions`¶

OpenAI Chat Completions API — compatible path for OpenAI SDK clients, LangChain, and LiteLLM.

Headers:

Header	Value	Required
`Content-Type`	`application/json`	Yes
`Authorization`	`Bearer <api-key>`	Yes

Request Body:

{
  "model": "gpt-4o",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "Hello!" }
  ],
  "max_tokens": 1024,
  "stream": false
}

Request Parameters:

Parameter	Type	Required	Description
`model`	string	Yes	Model ID
`messages`	array	Yes	Message list with `role` and `content`
`max_tokens`	integer	No	Maximum response tokens
`stream`	boolean	No	Enable SSE streaming
`temperature`	float	No	Sampling temperature
`functions`	array	No	Function/tool definitions (legacy)
`tools`	array	No	Tool definitions

`POST /ingest` / `POST /ingest/batch`¶

Ingest context blocks directly into the vault index at runtime.

Request Body:

{
  "block_id": "my-context-001",
  "content": "This is important context to inject...",
  "source_path": "custom/context.md",
  "risk_class": "safe"
}

Batch variant (/ingest/batch) accepts an array of blocks.

`POST /config/reload`¶

Hot-reload configuration from environment variables (localhost only).

Equivalent to sending SIGHUP to the proxy process.

Response:

{
  "status": "ok",
  "message": "Config reloaded: TOKENPAK_MODE=hybrid, TOKENPAK_COMPACT=1"
}

Note: Only accepts requests from 127.0.0.1 or ::1. Remote calls receive 403 Forbidden.

Error Format¶

All error responses use a consistent JSON structure:

{
  "error": {
    "type": "error_type",
    "message": "Human-readable description"
  }
}

Common error types:

HTTP Status	`error.type`	Description
400	`bad_request`	Malformed request body
401	`unauthorized`	Missing or invalid `X-TokenPak-Key`
403	`forbidden`	Operation not allowed from this IP
404	`not_found`	Unknown endpoint path
405	`method_not_allowed`	Wrong HTTP method (e.g. GET on POST-only path)
429	`rate_limit_exceeded`	Too many requests from this IP
500	`internal_error`	Proxy-side error
503	`circuit_open`	Upstream provider circuit breaker open
503	`upstream_unreachable`	Cannot reach upstream provider

SDK Adapters¶

TokenPak provides adapters that route requests through the proxy while preserving the native API shape of each SDK.

Base Adapter (`TokenPakAdapter`)¶

All adapters inherit from TokenPakAdapter and implement four lifecycle hooks.

from tokenpak.adapters.base import TokenPakAdapter

Constructor Parameters:

Parameter	Type	Required	Default	Description
`base_url`	str	Yes	—	Proxy URL, e.g. `http://127.0.0.1:8766`
`api_key`	str	Yes	—	Provider API key (forwarded to upstream)
`timeout_s`	float	No	`120.0`	Request timeout in seconds

Lifecycle Methods:

Method	Signature	Description
`prepare_request`	`(request: dict) -> dict`	Validate and normalise request
`send`	`(prepared: dict) -> dict`	POST to proxy, return raw response
`parse_response`	`(response: dict) -> dict`	Convert to SDK-native format
`extract_tokens`	`(response: dict) -> dict`	Extract `{input, output, cache_read, cache_creation}` token counts

High-level call method:

# Convenience: calls prepare_request → send → parse_response
response = adapter.call(request_dict)

# Extract token usage
tokens = adapter.extract_tokens(response)
# tokens = {"input_tokens": 3140, "output_tokens": 512, ...}

AnthropicAdapter¶

Routes requests to /v1/messages on the proxy.

from tokenpak.adapters import AnthropicAdapter

adapter = AnthropicAdapter(
    base_url="http://127.0.0.1:8766",
    api_key="sk-ant-api03-...",
)

response = adapter.call({
    "model": "claude-sonnet-4-6",
    "max_tokens": 1024,
    "messages": [
        {"role": "user", "content": "What is 2 + 2?"}
    ],
})

print(response["content"][0]["text"])
tokens = adapter.extract_tokens(response)
print(f"Input tokens: {tokens['input_tokens']}")

Proxy Path: POST /v1/messages

Required Request Fields:

Field	Type	Description
`model`	string	Claude model ID
`messages`	list	Non-empty list of `{role, content}` dicts
`max_tokens`	integer	Maximum completion tokens

Added Defaults (if not present): - stream defaults to false

Extra Headers Sent: - anthropic-version: 2023-06-01

extract_tokens Return:

{
  "input_tokens": 3140,
  "output_tokens": 512,
  "cache_read_input_tokens": 1200,
  "cache_creation_input_tokens": 800
}

OpenAIAdapter¶

Routes requests to /v1/chat/completions on the proxy.

from tokenpak.adapters import OpenAIAdapter

adapter = OpenAIAdapter(
    base_url="http://127.0.0.1:8766",
    api_key="sk-...",
)

response = adapter.call({
    "model": "gpt-4o",
    "messages": [
        {"role": "user", "content": "Hello, world!"}
    ],
})

Proxy Path: POST /v1/chat/completions

Required Request Fields:

Field	Type	Description
`model`	string	OpenAI model ID
`messages`	list	Non-empty list of `{role, content}` dicts

LangChainAdapter¶

Drop-in adapter for LangChain integrations.

from tokenpak.adapters import LangChainAdapter

adapter = LangChainAdapter(
    base_url="http://127.0.0.1:8766",
    api_key="sk-ant-...",
)

Constructor Parameters:

Parameter	Type	Required	Default	Description
`base_url`	str	Yes	—	Proxy URL
`api_key`	str	Yes	—	Provider API key
`timeout_s`	float	No	`120.0`	Request timeout

LiteLLMAdapter¶

Drop-in adapter for LiteLLM integrations.

from tokenpak.adapters import LiteLLMAdapter

adapter = LiteLLMAdapter(
    base_url="http://127.0.0.1:8766",
    api_key="sk-ant-...",
)

Constructor Parameters:

Parameter	Type	Required	Default	Description
`base_url`	str	Yes	—	Proxy URL
`api_key`	str	Yes	—	Provider API key
`timeout_s`	float	No	`120.0`	Request timeout

Exception Hierarchy¶

All adapters raise canonical exceptions — never raw requests exceptions.

TokenPakAdapterError (base)
├── TokenPakTimeoutError      — proxy did not respond within timeout_s
├── TokenPakConfigError       — missing required fields / bad config
└── TokenPakAuthError         — 401 or 403 from proxy

Usage:

from tokenpak.adapters.base import (
    TokenPakAdapterError,
    TokenPakTimeoutError,
    TokenPakAuthError,
    TokenPakConfigError,
)

try:
    response = adapter.call(request)
except TokenPakTimeoutError:
    print("Proxy timed out")
except TokenPakAuthError as e:
    print(f"Auth failed: {e} (HTTP {e.status_code})")
except TokenPakConfigError as e:
    print(f"Config error: {e}")
except TokenPakAdapterError as e:
    print(f"Adapter error: {e} (HTTP {e.status_code})")

CLI Commands¶

All commands are invoked as tokenpak <command> [options].

Proxy Lifecycle¶

`tokenpak start`¶

Start the proxy (default: localhost:8766).

tokenpak start                    # Start on default port 8766
tokenpak start --port 9000        # Custom port
tokenpak start --debug            # Verbose logging
tokenpak start --background       # Run as background daemon

Options:

Option	Type	Default	Description
`--port`	int	`8766`	Port to listen on
`--debug`	flag	off	Enable debug logging
`--background`	flag	off	Daemonize the process

`tokenpak stop`¶

Stop the running proxy process.

tokenpak stop

`tokenpak restart`¶

Restart the proxy (stop + start).

tokenpak restart

`tokenpak logs`¶

Show recent proxy log output.

tokenpak logs                # Last 50 lines
tokenpak logs -n 100         # Last 100 lines
tokenpak logs --follow       # Stream new log lines (tail -f)

Options:

Option	Type	Default	Description
`-n`, `--lines`	int	`50`	Number of log lines to show

`tokenpak status`¶

Show system status and recent retry events.

tokenpak status

`tokenpak version`¶

Show current versions (proxy, config, CLI).

tokenpak version

`tokenpak update`¶

Update TokenPak to latest version from git/PyPI.

tokenpak update

Indexing & Search¶

`tokenpak index [directory]`¶

Index a directory for vault-based context injection.

tokenpak index ~/vault           # Index the vault
tokenpak index ~/vault --watch   # Watch and auto-reindex on changes
tokenpak index --status          # Show indexed file count by type
tokenpak index -w 8              # Use 8 parallel workers

Options:

Option	Type	Default	Description
`directory`	path	—	Directory to index (positional)
`--status`	flag	off	Show indexed file count by type
`--workers`, `-w`	int	`4`	Parallel indexing workers
`--watch`	flag	off	Watch for file changes and auto-reindex
`--recalibrate`	flag	off	Run worker calibration before indexing
`--max-workers`	int	`8`	Worker cap for auto-calibration

`tokenpak search <query>`¶

Search the indexed vault content using BM25.

tokenpak search "compression budget"
tokenpak search "rate limits" --top 10

Options:

Option	Type	Default	Description
`query`	string	—	Search query (positional)

`tokenpak calibrate`¶

Calibrate the optimal worker count for parallel indexing on this host.

tokenpak calibrate

Monitoring & Stats¶

`tokenpak stats`¶

Show registry statistics (request counts, token usage, cost breakdown).

tokenpak stats
tokenpak stats --raw             # JSON output

`tokenpak models`¶

Show per-model usage and efficiency breakdown.

tokenpak models                      # Summary table
tokenpak models --detail sonnet      # Detailed view for models matching "sonnet"
tokenpak models --raw                # JSON output

Options:

Option	Type	Description
`--detail`	string	Show details for a specific model (partial match)
`--raw`	flag	Output as JSON

`tokenpak savings[?since=<date>]`¶

Show savings summary — tokens and cost saved by compression.

tokenpak savings
tokenpak savings --since 2026-03-01

`tokenpak usage`¶

Show model usage summary.

tokenpak usage

`tokenpak compare`¶

Show before/after cost comparison for the last proxied request.

tokenpak compare

`tokenpak leaderboard`¶

Show per-model efficiency ranking (savings rate, cost per token).

tokenpak leaderboard

`tokenpak report`¶

Generate a daily savings report.

tokenpak report

`tokenpak requests`¶

Live request explorer — browse recent proxied requests interactively.

tokenpak requests

`tokenpak timeline`¶

View savings trend over the last 7 or 30 days.

tokenpak timeline

`tokenpak attribution`¶

View savings broken down by agent, skill, and model.

tokenpak attribution

`tokenpak aggregate`¶

Aggregate request ledger data across multiple machines.

tokenpak aggregate

`tokenpak monitor`¶

Start the live monitor dashboard on port 8767.

tokenpak monitor
tokenpak monitor --port 8768     # Custom port

`tokenpak dashboard`¶

Real-time health dashboard (TUI) or serve the public web dashboard URL.

tokenpak dashboard               # TUI view
tokenpak dashboard --public      # Open web dashboard in browser

`tokenpak check-alerts`¶

Evaluate alert rules and report any health violations.

tokenpak check-alerts

Diagnostics¶

`tokenpak doctor`¶

Run comprehensive system diagnostics.

tokenpak doctor

Checks: - Proxy connectivity (port 8766) - Upstream provider reachability - API key validity - Vault index health - Config file validity

`tokenpak preview <file>`¶

Preview compression dry-run on a file — shows token savings before sending to API.

tokenpak preview prompt.txt
tokenpak preview --mode aggressive prompt.txt

`tokenpak debug on|off|status`¶

Toggle verbose debug logging or check current debug state.

tokenpak debug on
tokenpak debug off
tokenpak debug status

`tokenpak learn status`¶

Show learned compression patterns from telemetry.

tokenpak learn status

`tokenpak learn reset`¶

Clear all learned data and reset to baseline.

tokenpak learn reset

`tokenpak replay`¶

List, inspect, and re-run captured sessions (zero API cost).

tokenpak replay list             # List recent captured sessions
tokenpak replay show <id>        # Show full details
tokenpak replay run <id>         # Re-run with different settings
tokenpak replay clear            # Remove all entries

`tokenpak validate <file>`¶

Validate a TokenPak JSON file against the v1.0 schema.

tokenpak validate my-config.json

`tokenpak diff`¶

Show context changes (removed/compressed/retained blocks) for a request.

tokenpak diff

`tokenpak vault-health`¶

Vault index health diagnostic and repair.

tokenpak vault-health            # Check index health
tokenpak vault-health repair     # Rebuild stale vault index

Config Management¶

`tokenpak setup`¶

Interactive first-time configuration wizard.

tokenpak setup

`tokenpak config`¶

Config management subcommands.

tokenpak config show             # Show merged config (file + env overrides)
tokenpak config sync             # Sync config from canonical source
tokenpak config pull             # Pull config from git or URL
tokenpak config validate         # Validate config against schema
tokenpak config init             # Create default config.yaml
tokenpak config path             # Print config file path

`tokenpak route`¶

Manage manual model routing rules.

tokenpak route list              # List routing rules
tokenpak route add               # Add a rule
tokenpak route remove <id>       # Remove a rule

Advanced Commands¶

`tokenpak serve`¶

Start monitoring proxy or telemetry ingest server.

tokenpak serve                   # Standard proxy
tokenpak serve --telemetry       # Telemetry ingest server
tokenpak serve --ingest          # Phase 5A ingest API server
tokenpak serve --workers 2       # Multiple uvicorn workers

`tokenpak benchmark`¶

Benchmark compression performance.

tokenpak benchmark               # Built-in sample data
tokenpak benchmark --file prompt.txt
tokenpak benchmark --latency ~/vault   # Latency/indexing benchmark
tokenpak benchmark --json        # JSON output

`tokenpak macro`¶

Manage and run compression macros.

tokenpak macro list              # List all macros
tokenpak macro run <name>        # Run a macro
tokenpak macro create            # Create a user-defined YAML macro
tokenpak macro show <name>       # Show macro definition
tokenpak macro delete <name>     # Delete a user-defined macro

`tokenpak recipe`¶

Manage compression recipes (YAML workflow definitions).

tokenpak recipe create           # Scaffold a new recipe YAML
tokenpak recipe validate <file>  # Validate recipe against schema
tokenpak recipe test <file>      # Test recipe against sample input
tokenpak recipe benchmark <file> # Benchmark recipe performance

`tokenpak fleet`¶

Manage and query a multi-machine proxy fleet.

tokenpak fleet init              # Configure fleet interactively
tokenpak fleet status            # Show fleet health
tokenpak fleet list              # List fleet members

`tokenpak template`¶

Manage local user prompt templates.

tokenpak template list
tokenpak template add <name>     # Add or update a template
tokenpak template show <name>    # Display a template
tokenpak template remove <name>  # Delete a template
tokenpak template use <name>     # Expand a template with variables

`tokenpak audit`¶

Enterprise audit log management.

tokenpak audit list              # List audit log entries
tokenpak audit export            # Export to file
tokenpak audit verify            # Verify hash chain integrity
tokenpak audit prune             # Remove old entries
tokenpak audit summary           # Show audit stats

Configuration Reference¶

Environment Variables¶

The proxy reads configuration from ~/.tokenpak/config.yaml with environment variable overrides. Environment variables always take precedence.

Core Settings¶

Variable	Default	Description
`TOKENPAK_PORT`	`8766`	Proxy listen port
`TOKENPAK_MODE`	`hybrid`	Compression mode: `strict`, `hybrid`, `aggressive`
`TOKENPAK_COMPACT`	`1`	Master on/off switch (0 = disable all compression)
`TOKENPAK_DB`	`.tokenpak/monitor.db`	SQLite database path

Compression Settings¶

Variable	Default	Description
`TOKENPAK_COMPACT_MAX_CHARS`	`120`	Maximum chars for compressed text chunks
`TOKENPAK_COMPACT_THRESHOLD_TOKENS`	`4500`	Skip compression below this token count
`TOKENPAK_COMPACT_CACHE_SIZE`	`2000`	Compression result cache entries
`TOKENPAK_MAX_COMPRESSION_TIME_MS`	`5000`	Max compression time before skipping (0 = no cap)

Vault Context Injection¶

Variable	Default	Description
`TOKENPAK_VAULT_INDEX`	`~/vault/.tokenpak`	Path to vault index directory
`TOKENPAK_INJECT_BUDGET`	`4000`	Max tokens to inject from vault per request
`TOKENPAK_INJECT_TOP_K`	`5`	Max vault blocks to inject per request
`TOKENPAK_INJECT_MIN_SCORE`	`2.0`	Minimum BM25 score to include a block
`TOKENPAK_RETRIEVAL_BACKEND`	`json_blocks`	Vault backend: `json_blocks` or `sqlite`

Key Management¶

Variable	Default	Description
`ANTHROPIC_API_KEY`	—	Primary Anthropic API key
`ANTHROPIC_OAUTH_TOKEN`	—	Rotation key 2
`ANTHROPIC_OAUTH_TOKEN2`	—	Rotation key 3
`TOKENPAK_KEY_ROTATION`	`failover`	Key rotation mode: `failover` or `roundrobin`
`TOKENPAK_KEY_COOLDOWN_429`	`60`	Rate-limit cooldown seconds
`TOKENPAK_KEY_COOLDOWN_401`	`300`	Invalid-key cooldown seconds

Advanced Features¶

Variable	Default	Description
`TOKENPAK_CAPSULE_BUILDER`	`0`	Enable capsule builder stage (`0` or `1`)
`TOKENPAK_CAPSULE_MIN_CHARS`	`400`	Min chars for a block to be capsulised
`TOKENPAK_ROUTER_ENABLED`	`true`	Enable smart model router
`TOKENPAK_HTTP100_KEEPALIVE`	`0`	Send HTTP 100 Continue before compression

config.yaml¶

Default location: ~/.tokenpak/config.yaml

# TokenPak configuration
# All settings can also be overridden via environment variables

compression:
  enabled: true
  mode: hybrid               # strict | hybrid | aggressive
  max_chars: 120             # Max chars per compressed chunk
  threshold_tokens: 4500     # Skip compression below this token count

cache:
  enabled: true
  type: memory               # memory | disk
  ttl_seconds: 3600
  max_size_mb: 256

logging:
  enabled: true
  level: info                # debug | info | warning | error
  destination: file          # file | stdout
  retention_days: 30
  include_request_body: false
  include_response_body: false

metrics:
  enabled: true
  collection_window_seconds: 60
  retention_days: 7

security:
  require_api_key: false
  api_key: null              # X-TokenPak-Key for non-localhost clients
  allowed_origins:
    - "*"
  rate_limit_per_minute: 1000

advanced:
  vault_index_path: "~/.tokenpak/.index"
  enable_trace_logs: false
  proxy_timeout_seconds: 30
  max_request_size_bytes: 10485760    # 10 MB

This reference covers TokenPak v0.5.0. For the full changelog, see CHANGELOG.md.

TokenPak API Reference¶

Table of Contents¶

Proxy HTTP API¶

Authentication¶

GET Endpoints¶

GET /¶

GET /health¶

GET /stats¶

GET /stats/last¶

GET /stats/session¶

GET /savings[?since=<ISO-date>]¶

GET /cache-stats¶

GET /recent¶

GET /trace/last¶

GET /trace/<request_id>¶

GET /traces¶

GET /vault¶

GET /metrics¶

GET /metrics/dashboard¶

GET /dashboard / GET /dashboard/<path>¶

GET /docs / GET /docs/¶

GET /openapi.yaml¶

POST Endpoints¶

POST /v1/messages¶

POST /v1/chat/completions¶

POST /ingest / POST /ingest/batch¶

POST /config/reload¶

Error Format¶

SDK Adapters¶

Base Adapter (TokenPakAdapter)¶

AnthropicAdapter¶

OpenAIAdapter¶

LangChainAdapter¶

LiteLLMAdapter¶

Exception Hierarchy¶

CLI Commands¶

Proxy Lifecycle¶

tokenpak start¶

tokenpak stop¶

tokenpak restart¶

tokenpak logs¶

tokenpak status¶

tokenpak version¶

tokenpak update¶

Indexing & Search¶

tokenpak index [directory]¶

tokenpak search <query>¶

tokenpak calibrate¶

Monitoring & Stats¶

tokenpak stats¶

tokenpak models¶

tokenpak savings[?since=<date>]¶

tokenpak usage¶

tokenpak compare¶

tokenpak leaderboard¶

tokenpak report¶

tokenpak requests¶

tokenpak timeline¶

tokenpak attribution¶

tokenpak aggregate¶

tokenpak monitor¶

tokenpak dashboard¶

tokenpak check-alerts¶

Diagnostics¶

tokenpak doctor¶

tokenpak preview <file>¶

tokenpak debug on|off|status¶

tokenpak learn status¶

tokenpak learn reset¶

tokenpak replay¶

tokenpak validate <file>¶

tokenpak diff¶

tokenpak vault-health¶

Config Management¶

tokenpak setup¶

tokenpak config¶

tokenpak route¶

Advanced Commands¶

tokenpak serve¶

tokenpak benchmark¶

`GET /`¶

`GET /health`¶

`GET /stats`¶

`GET /stats/last`¶

`GET /stats/session`¶

`GET /savings[?since=<ISO-date>]`¶

`GET /cache-stats`¶

`GET /recent`¶

`GET /trace/last`¶

`GET /trace/<request_id>`¶

`GET /traces`¶

`GET /vault`¶

`GET /metrics`¶

`GET /metrics/dashboard`¶

`GET /dashboard` / `GET /dashboard/<path>`¶

`GET /docs` / `GET /docs/`¶

`GET /openapi.yaml`¶

`POST /v1/messages`¶

`POST /v1/chat/completions`¶

`POST /ingest` / `POST /ingest/batch`¶

`POST /config/reload`¶

Base Adapter (`TokenPakAdapter`)¶

`tokenpak start`¶

`tokenpak stop`¶

`tokenpak restart`¶

`tokenpak logs`¶

`tokenpak status`¶

`tokenpak version`¶

`tokenpak update`¶

`tokenpak index [directory]`¶

`tokenpak search <query>`¶

`tokenpak calibrate`¶

`tokenpak stats`¶

`tokenpak models`¶

`tokenpak savings[?since=<date>]`¶

`tokenpak usage`¶

`tokenpak compare`¶

`tokenpak leaderboard`¶

`tokenpak report`¶

`tokenpak requests`¶

`tokenpak timeline`¶

`tokenpak attribution`¶

`tokenpak aggregate`¶

`tokenpak monitor`¶

`tokenpak dashboard`¶

`tokenpak check-alerts`¶

`tokenpak doctor`¶

`tokenpak preview <file>`¶

`tokenpak debug on|off|status`¶

`tokenpak learn status`¶

`tokenpak learn reset`¶

`tokenpak replay`¶

`tokenpak validate <file>`¶

`tokenpak diff`¶

`tokenpak vault-health`¶

`tokenpak setup`¶

`tokenpak config`¶

`tokenpak route`¶

`tokenpak serve`¶

`tokenpak benchmark`¶

`tokenpak macro`¶

`tokenpak recipe`¶

`tokenpak fleet`¶

`tokenpak template`¶

`tokenpak audit`¶