Proxy Setup Guide¶

How to connect any LLM client to TokenPak.

How the Proxy Works¶

TokenPak acts as an HTTP proxy between your LLM client and the provider API:

Your Client → http://localhost:8766 → [Compress] → Provider API → [Stats] → Your Client

Your client sends a normal API request to localhost:8766
TokenPak compresses the prompt (if beneficial)
The compressed request is forwarded to the real provider
The response comes back (with optional stats footer)
Cost and token data are recorded locally

Your API key is in the Authorization header and passes through untouched. TokenPak never reads, stores, or logs credentials.

Start the Proxy¶

tokenpak serve
# Default: port 8766, hybrid compression mode

Options:

tokenpak serve \
  --port 8766 \
  --mode hybrid \      # strict | hybrid | aggressive
  --daemon             # background mode

Compression modes:

Mode	When to use
`strict`	Only compress when clearly beneficial (>4500 tokens)
`hybrid`	Balance compression and latency (recommended)
`aggressive`	Maximum compression, every request

Provider Auto-Detection¶

TokenPak detects the target provider from your request headers and routes accordingly:

Your `Authorization` format	Routes to
`Bearer sk-ant-...`	`api.anthropic.com`
`Bearer sk-...`	`api.openai.com`
Custom headers	Configurable via `proxy.passthrough_url`

Setup by Client¶

Claude Code¶

Environment variableSettings filePer-project

export ANTHROPIC_BASE_URL=http://localhost:8766
claude

In ~/.claude/settings.json:

{
  "env": {
    "ANTHROPIC_BASE_URL": "http://localhost:8766"
  }
}

In .claude/settings.json at your project root:

{
  "env": {
    "ANTHROPIC_BASE_URL": "http://localhost:8766"
  }
}

OpenAI Python SDK¶

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-key",
    base_url="http://localhost:8766/v1"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

OpenClaw¶

In your openclaw.json:

{
  "providers": {
    "anthropic": {
      "baseUrl": "http://localhost:8766"
    },
    "openai": {
      "baseUrl": "http://localhost:8766/v1"
    }
  }
}

Cursor / Copilot¶

Set the OpenAI base URL in settings:

http://localhost:8766/v1

Generic HTTP Client¶

Replace any provider base URL with http://localhost:8766. All standard REST API paths work as-is. TokenPak is protocol-transparent.

Multi-Provider Setup¶

Use different upstream URLs per provider:

{
  "proxy": {
    "port": 8766,
    "providers": {
      "anthropic": { "url": "https://api.anthropic.com" },
      "openai": { "url": "https://api.openai.com" },
      "custom": { "url": "https://your-openai-compatible.api" }
    }
  }
}

Compression Configuration¶

Control what gets compressed via ~/.tokenpak/config.json:

{
  "compression": {
    "enabled": true,
    "level": "balanced",
    "threshold_tokens": 4500,
    "preserve_code": true,
    "preserve_json": false
  }
}

Or via CLI:

tokenpak config set compression.level aggressive
tokenpak config set compression.threshold_tokens 2000

By default, TokenPak appends a one-line stats footer to each response:

[TokenPak: 4,231→2,847 tokens | saved 33% | $0.004]

Disable it:

tokenpak config set proxy.stats_footer false

Environment Variables¶

Override any config value with env vars:

Variable	Default	Description
`TOKENPAK_PORT`	`8766`	Proxy listen port
`TOKENPAK_MODE`	`hybrid`	Compression mode
`TOKENPAK_COMPACT`	`1`	Master compression switch (0/1)
`TOKENPAK_COMPACT_THRESHOLD_TOKENS`	`4500`	Min tokens to trigger compression
`TOKENPAK_DB`	`.ocp/monitor.db`	Database path

Systemd Service (Linux)¶

Run TokenPak as a system service:

# Install the service unit
tokenpak service install

# Start and enable
systemctl --user enable tokenpak
systemctl --user start tokenpak

# Check status
systemctl --user status tokenpak

The service file is written to ~/.config/systemd/user/tokenpak.service.

Troubleshooting¶

Proxy not reachable:

tokenpak doctor
# Checks port binding, config, firewall

Requests not being compressed:

tokenpak status --full
# Look for: compression: enabled | mode: hybrid

Higher latency than expected:

# Try strict mode (skips compression on small requests)
tokenpak config set proxy.mode strict

API key errors:

# Verify passthrough is clean
tokenpak debug on --requests 1
# Make a request, then:
tokenpak debug off
tokenpak trace --last
# Check that Authorization header was forwarded unchanged