Proxy Setup Guide¶
How to connect any LLM client to TokenPak.
How the Proxy Works¶
TokenPak acts as an HTTP proxy between your LLM client and the provider API:
- Your client sends a normal API request to
localhost:8766 - TokenPak compresses the prompt (if beneficial)
- The compressed request is forwarded to the real provider
- The response comes back (with optional stats footer)
- Cost and token data are recorded locally
Your API key is in the Authorization header and passes through untouched. TokenPak never reads, stores, or logs credentials.
Start the Proxy¶
Options:
tokenpak serve \
--port 8766 \
--mode hybrid \ # strict | hybrid | aggressive
--daemon # background mode
Compression modes:
| Mode | When to use |
|---|---|
strict |
Only compress when clearly beneficial (>4500 tokens) |
hybrid |
Balance compression and latency (recommended) |
aggressive |
Maximum compression, every request |
Provider Auto-Detection¶
TokenPak detects the target provider from your request headers and routes accordingly:
Your Authorization format |
Routes to |
|---|---|
Bearer sk-ant-... |
api.anthropic.com |
Bearer sk-... |
api.openai.com |
| Custom headers | Configurable via proxy.passthrough_url |
Setup by Client¶
Claude Code¶
OpenAI Python SDK¶
from openai import OpenAI
client = OpenAI(
api_key="sk-your-key",
base_url="http://localhost:8766/v1"
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
OpenClaw¶
In your openclaw.json:
{
"providers": {
"anthropic": {
"baseUrl": "http://localhost:8766"
},
"openai": {
"baseUrl": "http://localhost:8766/v1"
}
}
}
Cursor / Copilot¶
Set the OpenAI base URL in settings:
Generic HTTP Client¶
Replace any provider base URL with http://localhost:8766. All standard REST API paths work as-is. TokenPak is protocol-transparent.
Multi-Provider Setup¶
Use different upstream URLs per provider:
{
"proxy": {
"port": 8766,
"providers": {
"anthropic": { "url": "https://api.anthropic.com" },
"openai": { "url": "https://api.openai.com" },
"custom": { "url": "https://your-openai-compatible.api" }
}
}
}
Compression Configuration¶
Control what gets compressed via ~/.tokenpak/config.json:
{
"compression": {
"enabled": true,
"level": "balanced",
"threshold_tokens": 4500,
"preserve_code": true,
"preserve_json": false
}
}
Or via CLI:
tokenpak config set compression.level aggressive
tokenpak config set compression.threshold_tokens 2000
Response Footer (Stats Injection)¶
By default, TokenPak appends a one-line stats footer to each response:
Disable it:
Environment Variables¶
Override any config value with env vars:
| Variable | Default | Description |
|---|---|---|
TOKENPAK_PORT |
8766 |
Proxy listen port |
TOKENPAK_MODE |
hybrid |
Compression mode |
TOKENPAK_COMPACT |
1 |
Master compression switch (0/1) |
TOKENPAK_COMPACT_THRESHOLD_TOKENS |
4500 |
Min tokens to trigger compression |
TOKENPAK_DB |
.ocp/monitor.db |
Database path |
Systemd Service (Linux)¶
Run TokenPak as a system service:
# Install the service unit
tokenpak service install
# Start and enable
systemctl --user enable tokenpak
systemctl --user start tokenpak
# Check status
systemctl --user status tokenpak
The service file is written to ~/.config/systemd/user/tokenpak.service.
Troubleshooting¶
Proxy not reachable:
Requests not being compressed:
Higher latency than expected:
API key errors: