TokenPak Deployment Guide¶
This guide covers production deployment of the TokenPak LLM proxy — from a single machine to load-balanced multi-instance setups.
Quick reference: For basic local install, see the root DEPLOYMENT.md. This guide focuses on hardened, production-grade deployments.
System Requirements¶
Minimum Hardware¶
| Resource | Minimum | Recommended |
|---|---|---|
| CPU | 1 core | 2+ cores |
| RAM | 256 MB | 512 MB–1 GB |
| Disk | 100 MB | 1 GB+ (telemetry DB + vault index) |
| Python | 3.10+ | 3.11+ |
| OS | Linux / macOS / Windows | Linux (Ubuntu 22.04 LTS+) |
RAM is low because TokenPak is a lightweight async proxy. The main consumer is the optional vault index — budget ~100 MB per 10,000 indexed files.
Network Requirements¶
| Port | Protocol | Purpose | Required |
|---|---|---|---|
8766 |
TCP | Proxy ingress (clients → TokenPak) | Yes |
8766/dashboard |
HTTP | Web dashboard | Optional |
443 |
TCP (outbound) | LLM provider APIs | Yes |
Firewall rules (ufw example):
# Allow proxy port only from trusted IPs/networks
sudo ufw allow from 10.0.0.0/8 to any port 8766 proto tcp
# Deny public access to the proxy (it holds your API keys)
sudo ufw deny 8766
# Outbound HTTPS must be allowed
sudo ufw allow out 443/tcp
⚠️ Never expose port 8766 to the public internet. The proxy forwards requests with your API keys. Treat it like a database port.
Installation¶
Option 1: pip (recommended)¶
pip install tokenpak
# With optional extras
pip install tokenpak[tiktoken] # accurate token counting (recommended)
pip install tokenpak[ml] # ML-powered compression via LLMLingua
Option 2: From Source¶
git clone https://github.com/tokenpak/tokenpak
cd tokenpak
pip install -e .
# With extras
pip install -e ".[tiktoken,ml]"
Option 3: Docker¶
# Pull from registry
docker pull tokenpak/tokenpak:latest
# Or build from source
git clone https://github.com/tokenpak/tokenpak
cd tokenpak
docker build -t tokenpak:local .
Run the container:
docker run -d \
--name tokenpak \
--restart unless-stopped \
-p 127.0.0.1:8766:8766 \
-e ANTHROPIC_API_KEY="sk-ant-..." \
-e OPENAI_API_KEY="sk-..." \
-v tokenpak-data:/home/tokenpak/.tokenpak \
tokenpak/tokenpak:latest
Binding to
127.0.0.1:8766keeps the port local-only. Use a reverse proxy (nginx, Caddy) for external access.
Verify Install¶
tokenpak --version
tokenpak doctor # checks Python version, deps, config
tokenpak status # verify proxy is reachable
Configuration¶
Config File¶
Default location: ~/.tokenpak/config.json
{
"proxy": {
"port": 8766,
"host": "127.0.0.1",
"passthrough_url": "https://api.openai.com"
},
"compression": {
"enabled": true,
"level": "balanced",
"threshold_tokens": 4500
},
"budget": {
"monthly_usd": 100,
"alert_at_pct": 80
},
"vault": {
"db_path": "~/.tokenpak/registry.db",
"watch": false
},
"stats_footer": false,
"debug": false
}
Environment Variables¶
All env vars override config file values. Env vars take priority.
| Variable | Default | Description |
|---|---|---|
ANTHROPIC_API_KEY |
— | Anthropic API key (forwarded to provider) |
OPENAI_API_KEY |
— | OpenAI API key |
GOOGLE_API_KEY |
— | Google Gemini API key |
TOKENPAK_PORT |
8766 |
Proxy listen port |
TOKENPAK_HOST |
127.0.0.1 |
Bind address (0.0.0.0 for all interfaces) |
TOKENPAK_MODE |
hybrid |
Compression mode: strict, hybrid, aggressive |
TOKENPAK_COMPACT |
1 |
Master compression switch (0 to disable) |
TOKENPAK_COMPACT_THRESHOLD_TOKENS |
4500 |
Min tokens before compression activates |
TOKENPAK_DB |
.ocp/monitor.db |
SQLite telemetry database path |
TOKENPAK_STATS_FOOTER |
0 |
Append savings summary to responses |
TOKENPAK_DEBUG |
0 |
Enable debug logging |
TOKENPAK_METRICS_ENABLED |
0 |
Opt-in anonymous usage metrics |
Security Best Practices for Secrets¶
Never hardcode API keys in config files or code. Use one of these approaches:
Option A: Environment file (recommended for systemd)¶
# Create protected env file
sudo mkdir -p /etc/tokenpak
sudo touch /etc/tokenpak/secrets.env
sudo chmod 600 /etc/tokenpak/secrets.env
sudo chown tokenpak:tokenpak /etc/tokenpak/secrets.env
# Add secrets
echo "ANTHROPIC_API_KEY=sk-ant-..." | sudo tee -a /etc/tokenpak/secrets.env
echo "OPENAI_API_KEY=sk-..." | sudo tee -a /etc/tokenpak/secrets.env
Reference in systemd: EnvironmentFile=/etc/tokenpak/secrets.env
Option B: System keyring (desktop/dev machines)¶
Option C: Cloud secrets manager (production)¶
- AWS: Secrets Manager or Parameter Store
- GCP: Secret Manager
- Azure: Key Vault
Inject at runtime via your deployment tooling (e.g., aws secretsmanager get-secret-value | jq -r ...).
Option D: Docker secrets¶
echo "sk-ant-..." | docker secret create anthropic_api_key -
docker service create \
--secret anthropic_api_key \
--env ANTHROPIC_API_KEY_FILE=/run/secrets/anthropic_api_key \
tokenpak
Running as a Service¶
systemd (Linux — recommended)¶
Create a dedicated user:
Create the service file at /etc/systemd/system/tokenpak.service:
[Unit]
Description=TokenPak LLM Proxy
Documentation=https://github.com/tokenpak/tokenpak
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
User=tokenpak
Group=tokenpak
# Load API keys from protected file
EnvironmentFile=/etc/tokenpak/secrets.env
# Compression settings
Environment=TOKENPAK_MODE=hybrid
Environment=TOKENPAK_COMPACT=1
Environment=PYTHONUNBUFFERED=1
ExecStart=/usr/local/bin/tokenpak serve --port 8766
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
RestartSec=5s
StartLimitInterval=60s
StartLimitBurst=3
# Logging
StandardOutput=journal
StandardError=journal
SyslogIdentifier=tokenpak
# Security hardening
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/var/lib/tokenpak
[Install]
WantedBy=multi-user.target
Enable and start:
sudo systemctl daemon-reload
sudo systemctl enable tokenpak
sudo systemctl start tokenpak
# Verify
sudo systemctl status tokenpak
sudo journalctl -u tokenpak -f
Docker Compose¶
docker-compose.yml:
version: "3.9"
services:
tokenpak:
image: tokenpak/tokenpak:latest
container_name: tokenpak
restart: unless-stopped
ports:
- "127.0.0.1:8766:8766"
environment:
- TOKENPAK_MODE=hybrid
- TOKENPAK_COMPACT=1
- TOKENPAK_PORT=8766
env_file:
- .env.secrets # ANTHROPIC_API_KEY, OPENAI_API_KEY, etc.
volumes:
- tokenpak-data:/home/tokenpak/.tokenpak
healthcheck:
test: ["CMD", "tokenpak", "status"]
interval: 30s
timeout: 10s
retries: 3
start_period: 10s
# Optional: nginx reverse proxy for TLS termination
nginx:
image: nginx:alpine
restart: unless-stopped
ports:
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/conf.d/tokenpak.conf:ro
- ./certs:/etc/nginx/certs:ro
depends_on:
- tokenpak
volumes:
tokenpak-data:
.env.secrets (chmod 600, never commit):
Start:
Monitoring & Logs¶
# systemd
sudo journalctl -u tokenpak -f
sudo journalctl -u tokenpak --since "1 hour ago"
# Docker
docker logs tokenpak -f
# CLI health check
tokenpak status
tokenpak status --full
# Cost tracking
tokenpak cost --today
tokenpak cost --week
tokenpak savings --lifetime
# Dashboard
# http://localhost:8766/dashboard (runs alongside the proxy)
Scaling¶
Single Instance (default)¶
TokenPak is async (uvicorn + starlette) and handles concurrent requests well on a single machine. For most teams (<50 developers, <10K req/day), a single instance is sufficient.
Tune uvicorn workers:
Rule of thumb: workers = (2 × CPU cores) + 1.
Multi-Instance (load-balanced)¶
For high throughput, run multiple instances behind a load balancer.
Requirements when load-balancing: - Replace SQLite telemetry with a shared database (see below) - All instances must have the same API keys - Session stickiness is not required — TokenPak is stateless per-request
nginx load balancer config:
upstream tokenpak {
least_conn;
server 10.0.1.10:8766;
server 10.0.1.11:8766;
server 10.0.1.12:8766;
keepalive 32;
}
server {
listen 443 ssl;
server_name tokenpak.internal;
ssl_certificate /etc/nginx/certs/tokenpak.crt;
ssl_certificate_key /etc/nginx/certs/tokenpak.key;
location / {
proxy_pass http://tokenpak;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_read_timeout 120s;
}
}
Database Scaling¶
| Setup | Database | When to use |
|---|---|---|
| Single machine | SQLite (default) | Solo dev, small team |
| Multi-instance / shared telemetry | PostgreSQL | Team > 5, load-balanced |
| Read-heavy dashboards | PostgreSQL + read replica | Enterprise |
Switch to PostgreSQL:
pip install tokenpak[postgres]
# Set connection string
export TOKENPAK_DB=postgresql://tokenpak:password@db-host:5432/tokenpak
Run migrations:
Cache Scaling¶
| Setup | Cache | When to use |
|---|---|---|
| Single instance | In-memory (default) | Dev / small teams |
| Multi-instance | Redis | Load-balanced deployments |
Enable Redis cache:
Redis gives multi-instance cache sharing so duplicate requests (same prompt, same model) are served from cache across all nodes.
Troubleshooting¶
Proxy won't start¶
tokenpak doctor # auto-diagnoses common issues
tokenpak status # check if already running on that port
lsof -i :8766 # see what's using the port
Common fixes:
- Port already in use: tokenpak serve --port 8767 or kill the conflicting process
- Permission denied on port <1024: Use ports ≥1024 or set CAP_NET_BIND_SERVICE
- Python version too old: python --version must be 3.10+
Requests not being compressed¶
# Check compression status
tokenpak status
# Look for "Compression: enabled"
# Lower threshold for testing
TOKENPAK_COMPACT_THRESHOLD_TOKENS=100 tokenpak serve
Compression only activates above the token threshold (default: 4,500). Short requests pass through unchanged — this is correct behavior.
API key errors¶
# Verify keys are set
echo $ANTHROPIC_API_KEY | head -c 20
echo $OPENAI_API_KEY | head -c 20
# Test provider connectivity directly
curl https://api.anthropic.com/v1/models \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01"
High latency¶
# Enable debug mode to see timing breakdown
TOKENPAK_DEBUG=1 tokenpak serve
# Profile compression overhead
tokenpak benchmark --samples 10
Typical compression overhead: 5–50ms. If you're seeing >200ms, try:
- Reduce --workers (CPU contention)
- Set TOKENPAK_MODE=hybrid (avoids aggressive compression on small requests)
- Disable ML compression: pip uninstall llmlingua
Debug mode¶
Debug output shows: request routing, compression ratio per request, provider response times, cache hits/misses.
Performance tuning¶
# Calibrate workers for your hardware (run once)
tokenpak calibrate
# Check recommended settings
cat ~/.tokenpak/calibration.json
Example Deployments¶
Local (Single Machine)¶
Best for: Solo developer, personal use, testing.
# Install
pip install tokenpak[tiktoken]
# Set API keys in shell profile
echo 'export ANTHROPIC_API_KEY="sk-ant-..."' >> ~/.bashrc
echo 'export OPENAI_API_KEY="sk-..."' >> ~/.bashrc
source ~/.bashrc
# Start proxy
tokenpak serve --port 8766
# Point Claude Code at proxy
export ANTHROPIC_BASE_URL=http://localhost:8766
# View dashboard
open http://localhost:8766/dashboard
For persistence, install the user-level systemd service:
# (See Running as a Service → systemd, using --user variant)
systemctl --user enable tokenpak
systemctl --user start tokenpak
AWS (EC2 + ALB)¶
Best for: Team use, high availability.
Architecture:
Internet → ALB (HTTPS:443) → EC2 Auto Scaling Group (tokenpak:8766)
↓
RDS PostgreSQL (telemetry)
ElastiCache Redis (cache)
Step-by-step:
- Launch EC2 instance (t3.small minimum, t3.medium recommended)
- AMI: Ubuntu 22.04 LTS
-
Security group: allow port 8766 from ALB security group only
-
Install TokenPak:
-
Store secrets in AWS Secrets Manager:
Retrieve at startup (in /etc/tokenpak/secrets.env):
aws secretsmanager get-secret-value \
--secret-id tokenpak/api-keys \
--query SecretString --output text \
| jq -r 'to_entries[] | "\(.key)=\(.value)"' \
> /etc/tokenpak/secrets.env
chmod 600 /etc/tokenpak/secrets.env
-
Configure for PostgreSQL + Redis:
-
Set up systemd service (see Running as a Service section above)
-
Create ALB:
- Target group: HTTP, port 8766, health check path
/health - Listener: HTTPS:443 → target group
-
SSL cert via AWS ACM
-
Auto Scaling Group with the EC2 as launch template; scale on CPU > 60%.
Estimated cost: ~$30–60/month (t3.small × 2 + RDS db.t3.micro + ElastiCache cache.t3.micro).
GCP (Cloud Run)¶
Best for: Serverless, pay-per-request, zero ops.
Architecture:
Step-by-step:
-
Build and push image:
-
Store secrets in Secret Manager:
-
Deploy to Cloud Run:
gcloud run deploy tokenpak \ --image gcr.io/YOUR_PROJECT/tokenpak \ --platform managed \ --region us-central1 \ --port 8766 \ --no-allow-unauthenticated \ --set-secrets "ANTHROPIC_API_KEY=anthropic-api-key:latest,OPENAI_API_KEY=openai-api-key:latest" \ --set-env-vars "TOKENPAK_MODE=hybrid,TOKENPAK_HOST=0.0.0.0" \ --min-instances 1 \ --max-instances 10 \ --memory 512Mi \ --cpu 1 -
Restrict access:
-
Point clients at the Cloud Run URL with
Authorization: Bearer $(gcloud auth print-identity-token).
Estimated cost: ~$5–20/month at moderate usage (Cloud Run is billed per-request).
Azure (Container Apps)¶
Best for: Teams already on Azure, enterprise compliance requirements.
Architecture:
Clients → Azure Container Apps (tokenpak, auto-scale)
↓
Azure Database for PostgreSQL + Azure Cache for Redis
Step-by-step:
-
Store secrets in Key Vault:
-
Create Container Apps environment:
-
Deploy:
az containerapp create \ --name tokenpak \ --resource-group myRG \ --environment tokenpak-env \ --image tokenpak/tokenpak:latest \ --target-port 8766 \ --ingress internal \ --min-replicas 1 \ --max-replicas 10 \ --secrets \ "anthropic-key=keyvaultref:https://mykeyvault.vault.azure.net/secrets/anthropic-api-key,identityref:system" \ "openai-key=keyvaultref:https://mykeyvault.vault.azure.net/secrets/openai-api-key,identityref:system" \ --env-vars \ "ANTHROPIC_API_KEY=secretref:anthropic-key" \ "OPENAI_API_KEY=secretref:openai-key" \ "TOKENPAK_MODE=hybrid" \ "TOKENPAK_HOST=0.0.0.0" -
Restrict ingress to your VNet or specific IP ranges.
Estimated cost: ~$15–40/month (Container Apps consumption plan scales to zero when idle).
Upgrading¶
If running as a service:
pip install --upgrade tokenpak
sudo systemctl restart tokenpak # or: docker compose pull && docker compose up -d
Uninstall¶
# Stop service
sudo systemctl stop tokenpak
sudo systemctl disable tokenpak
# Remove package
pip uninstall tokenpak
# Remove data (optional — deletes all telemetry and vault indexes)
rm -rf ~/.tokenpak
sudo rm -rf /etc/tokenpak
See Also¶
- TROUBLESHOOTING.md — FAQ, common errors, performance tuning
- ARCHITECTURE.md — internals and design decisions
- API.md — proxy API reference
- docs/guides/team-server.md — shared team proxy setup