Skip to content

TokenPak Environment Variables

All configuration is via environment variables (or ~/.tokenpak/config.yaml). Env vars always take precedence over config file values.

Core Server

Variable Default Type Description
TOKENPAK_PORT 8766 int Proxy listen port
TOKENPAK_BIND_ADDRESS 127.0.0.1 str Proxy listen address (use 0.0.0.0 for LAN/Docker)
TOKENPAK_PROXY_KEY "" str Optional pre-shared key required on all requests (empty = disabled)
TOKENPAK_DASHBOARD_AUTH true bool Require auth token to access /stats, /health, /vault dashboards
TOKENPAK_DB <proxy_dir>/monitor.db str Path to SQLite monitor database
TOKENPAK_PROFILE balanced str Active cost/quality profile: balanced, economy, quality

Compression / Context Reduction

Variable Default Type Description
TOKENPAK_COMPACT true bool Master on/off switch for context compaction
TOKENPAK_MODE hybrid str Compilation mode: strict, hybrid, aggressive
TOKENPAK_COMPACT_MAX_CHARS 120 int Max chars per compacted text segment
TOKENPAK_COMPACT_THRESHOLD_TOKENS 4500 int Skip compaction below this token count
TOKENPAK_COMPACT_MAX_TOKENS 2000 int Hard cap on tokens injected from compaction
TOKENPAK_COMPACT_CACHE_SIZE 2000 int LRU cache size for compaction results
MAX_COMPRESSION_TIME_MS 5000 int Max time budget (ms) for entire compression pipeline; 0 = no cap

Vault / Retrieval

Variable Default Type Description
TOKENPAK_VAULT_INDEX ~/vault/.tokenpak str Path to vault .tokenpak index directory
TOKENPAK_VAULT_INDEX_RELOAD_INTERVAL 300 int Seconds between vault index reloads
TOKENPAK_VAULT_MEMORY_MAX 268435456 (256MB) int Max bytes for Tier 2 LRU content cache
TOKENPAK_VAULT_CACHE_PRELOAD 200 int Number of recently-modified blocks to preload into cache at startup
TOKENPAK_INJECT_BUDGET 4000 int Max tokens to inject from vault per request
TOKENPAK_INJECT_TOP_K 5 int Max vault blocks to inject per request
TOKENPAK_INJECT_MIN_SCORE 2.0 float Minimum BM25 score to include a block
TOKENPAK_INJECT_SKIP_MODELS haiku str Comma-separated model name prefixes to skip vault injection
TOKENPAK_INJECT_MIN_PROMPT 1000 int Minimum prompt chars before vault injection is attempted
TOKENPAK_RETRIEVAL_BACKEND json_blocks str Vault retrieval backend: json_blocks or sqlite

Upstream / Routing

Variable Default Type Description
TOKENPAK_UPSTREAM_TIMEOUT 300 int Upstream request timeout in seconds
TOKENPAK_ROUTER_ENABLED true bool Enable model/provider router
TOKENPAK_OLLAMA_UPSTREAM http://100.80.241.118:11434 str Ollama upstream URL
TOKENPAK_OLLAMA_TIMEOUT 20 int Ollama connection timeout in seconds
TOKENPAK_MAX_RETRIES 3 int Max upstream retry attempts
TOKENPAK_BACKOFF_BASE 1.0 float Retry backoff base multiplier (seconds)
TOKENPAK_BACKOFF_CAP 32.0 float Retry backoff maximum cap (seconds)
TOKENPAK_KEY_ROTATION failover str API key rotation strategy: failover, round-robin

Rate Limiting & Request Guards

Variable Default Type Description
TOKENPAK_RATE_LIMIT_RPM 60 int Max requests per minute per client IP (0 = disabled)
TOKENPAK_MAX_REQUEST_SIZE 10485760 (10MB) int Max request body size in bytes
TOKENPAK_STRICT_MODE false bool Strict request validation (reject ambiguous/malformed requests)
TOKENPAK_VALIDATION_GATE true bool Enable pre-flight token budget validation gate
TOKENPAK_VALIDATION_GATE_BUDGET_CAP 120000 int Max tokens allowed through validation gate
TOKENPAK_VALIDATION_GATE_SOFT true bool Soft mode: warn instead of reject on validation failures

Budget & Cost Control

Variable Default Type Description
TOKENPAK_BUDGET_TOTAL 12000 int Per-request token budget target
TOKENPAK_BUDGET_DAILY_LIMIT_USD 0 float Daily spend cap in USD (0 = disabled)
TOKENPAK_BUDGET_ALERT_PCT 80 float Alert threshold as % of daily limit

Feature Flags (Tier 1 — default OFF)

Variable Default Type Description
TOKENPAK_SEMANTIC_CACHE false bool Short-circuit cache for duplicate/similar queries
TOKENPAK_PREFIX_REGISTRY false bool Stable prefix tracking for cache optimization
TOKENPAK_COMPRESSION_DICT false bool Post-compaction dictionary compression
TOKENPAK_TRACE false bool Pipeline tracing (debug)

Feature Flags (Tier 2A — default OFF)

Variable Default Type Description
TOKENPAK_ERROR_NORMALIZER false bool Normalize error responses across providers
TOKENPAK_BUDGET_CONTROLLER false bool Enforce per-request token budget limits
TOKENPAK_REQUEST_LOGGER false bool Structured request/response logging
TOKENPAK_SALIENCE_ROUTER false bool Content-type-aware extraction before compaction

Feature Flags (Tier 2B — default OFF)

Variable Default Type Description
TOKENPAK_CACHE_REGISTRY false bool Unified stable/volatile cache registry
TOKENPAK_RETRIEVAL_WATCHDOG false bool Monitor retrieval latency and quality
TOKENPAK_FAILURE_MEMORY false bool Track and learn from upstream failures
TOKENPAK_FIDELITY_TIERS false bool Tiered response fidelity based on cost/quality tradeoffs

Feature Flags (Phase 3 — default OFF)

Variable Default Type Description
TOKENPAK_SESSION_CAPSULES false bool Session-scoped capsule summarization
TOKENPAK_PRECONDITION_GATES false bool Guard rails before expensive operations
TOKENPAK_QUERY_REWRITER false bool Automatic query rewriting for better retrieval
TOKENPAK_STABILITY_SCORER false bool Score and rank response stability

Other Features

Variable Default Type Description
TOKENPAK_CAPSULE_BUILDER false bool Enable capsule builder compression stage
TOKENPAK_CAPSULE_MIN_CHARS 400 int Minimum block chars for capsule compression
TOKENPAK_CAPSULE_HOT_WINDOW 2 int Trailing messages excluded from capsule compression
TOKENPAK_SKELETON_ENABLED true bool Strip function bodies from code blocks before injection
TOKENPAK_SHADOW_ENABLED true bool Shadow reader for passive observation
TOKENPAK_TERM_RESOLVER_ENABLED false bool Term-card resolver (inline term definitions)
TOKENPAK_TERM_RESOLVER_TOP_K 3 int Max term cards to inject per request
TOKENPAK_TERM_RESOLVER_MAX_BYTES 200 int Max bytes per term card
TOKENPAK_CHAT_FOOTER false bool Append cost/token footer to chat responses
TOKENPAK_HTTP100_KEEPALIVE false bool Send HTTP 100 Continue for keepalive

WebSocket

Variable Default Type Description
TOKENPAK_WS_PORT 8767 int WebSocket proxy port
TOKENPAK_WS_MAX_CONNECTIONS 50 int Max concurrent WebSocket connections

Generated from proxy.py and packages/core/proxy.py on 2026-03-28. To add a variable, use _cfg("config.key", default, "TOKENPAK_VAR_NAME", type) in proxy.py.