TokenPak Environment Variables¶

All configuration is via environment variables (or ~/.tokenpak/config.yaml). Env vars always take precedence over config file values.

Core Server¶

Variable	Default	Type	Description
`TOKENPAK_PORT`	`8766`	int	Proxy listen port
`TOKENPAK_BIND_ADDRESS`	`127.0.0.1`	str	Proxy listen address (use `0.0.0.0` for LAN/Docker)
`TOKENPAK_PROXY_KEY`	`""`	str	Optional pre-shared key required on all requests (empty = disabled)
`TOKENPAK_DASHBOARD_AUTH`	`true`	bool	Require auth token to access `/stats`, `/health`, `/vault` dashboards
`TOKENPAK_DB`	`<proxy_dir>/monitor.db`	str	Path to SQLite monitor database
`TOKENPAK_PROFILE`	`balanced`	str	Active cost/quality profile: `balanced`, `economy`, `quality`

Compression / Context Reduction¶

Variable	Default	Type	Description
`TOKENPAK_COMPACT`	`true`	bool	Master on/off switch for context compaction
`TOKENPAK_MODE`	`hybrid`	str	Compilation mode: `strict`, `hybrid`, `aggressive`
`TOKENPAK_COMPACT_MAX_CHARS`	`120`	int	Max chars per compacted text segment
`TOKENPAK_COMPACT_THRESHOLD_TOKENS`	`4500`	int	Skip compaction below this token count
`TOKENPAK_COMPACT_MAX_TOKENS`	`2000`	int	Hard cap on tokens injected from compaction
`TOKENPAK_COMPACT_CACHE_SIZE`	`2000`	int	LRU cache size for compaction results
`MAX_COMPRESSION_TIME_MS`	`5000`	int	Max time budget (ms) for entire compression pipeline; 0 = no cap

Vault / Retrieval¶

Variable	Default	Type	Description
`TOKENPAK_VAULT_INDEX`	`~/vault/.tokenpak`	str	Path to vault `.tokenpak` index directory
`TOKENPAK_VAULT_INDEX_RELOAD_INTERVAL`	`300`	int	Seconds between vault index reloads
`TOKENPAK_VAULT_MEMORY_MAX`	`268435456` (256MB)	int	Max bytes for Tier 2 LRU content cache
`TOKENPAK_VAULT_CACHE_PRELOAD`	`200`	int	Number of recently-modified blocks to preload into cache at startup
`TOKENPAK_INJECT_BUDGET`	`4000`	int	Max tokens to inject from vault per request
`TOKENPAK_INJECT_TOP_K`	`5`	int	Max vault blocks to inject per request
`TOKENPAK_INJECT_MIN_SCORE`	`2.0`	float	Minimum BM25 score to include a block
`TOKENPAK_INJECT_SKIP_MODELS`	`haiku`	str	Comma-separated model name prefixes to skip vault injection
`TOKENPAK_INJECT_MIN_PROMPT`	`1000`	int	Minimum prompt chars before vault injection is attempted
`TOKENPAK_RETRIEVAL_BACKEND`	`json_blocks`	str	Vault retrieval backend: `json_blocks` or `sqlite`

Upstream / Routing¶

Variable	Default	Type	Description
`TOKENPAK_UPSTREAM_TIMEOUT`	`300`	int	Upstream request timeout in seconds
`TOKENPAK_ROUTER_ENABLED`	`true`	bool	Enable model/provider router
`TOKENPAK_OLLAMA_UPSTREAM`	`http://100.80.241.118:11434`	str	Ollama upstream URL
`TOKENPAK_OLLAMA_TIMEOUT`	`20`	int	Ollama connection timeout in seconds
`TOKENPAK_MAX_RETRIES`	`3`	int	Max upstream retry attempts
`TOKENPAK_BACKOFF_BASE`	`1.0`	float	Retry backoff base multiplier (seconds)
`TOKENPAK_BACKOFF_CAP`	`32.0`	float	Retry backoff maximum cap (seconds)
`TOKENPAK_KEY_ROTATION`	`failover`	str	API key rotation strategy: `failover`, `round-robin`

Rate Limiting & Request Guards¶

Variable	Default	Type	Description
`TOKENPAK_RATE_LIMIT_RPM`	`60`	int	Max requests per minute per client IP (0 = disabled)
`TOKENPAK_MAX_REQUEST_SIZE`	`10485760` (10MB)	int	Max request body size in bytes
`TOKENPAK_STRICT_MODE`	`false`	bool	Strict request validation (reject ambiguous/malformed requests)
`TOKENPAK_VALIDATION_GATE`	`true`	bool	Enable pre-flight token budget validation gate
`TOKENPAK_VALIDATION_GATE_BUDGET_CAP`	`120000`	int	Max tokens allowed through validation gate
`TOKENPAK_VALIDATION_GATE_SOFT`	`true`	bool	Soft mode: warn instead of reject on validation failures

Budget & Cost Control¶

Variable	Default	Type	Description
`TOKENPAK_BUDGET_TOTAL`	`12000`	int	Per-request token budget target
`TOKENPAK_BUDGET_DAILY_LIMIT_USD`	`0`	float	Daily spend cap in USD (0 = disabled)
`TOKENPAK_BUDGET_ALERT_PCT`	`80`	float	Alert threshold as % of daily limit

Feature Flags (Tier 1 — default OFF)¶

Variable	Default	Type	Description
`TOKENPAK_SEMANTIC_CACHE`	`false`	bool	Short-circuit cache for duplicate/similar queries
`TOKENPAK_PREFIX_REGISTRY`	`false`	bool	Stable prefix tracking for cache optimization
`TOKENPAK_COMPRESSION_DICT`	`false`	bool	Post-compaction dictionary compression
`TOKENPAK_TRACE`	`false`	bool	Pipeline tracing (debug)

Feature Flags (Tier 2A — default OFF)¶

Variable	Default	Type	Description
`TOKENPAK_ERROR_NORMALIZER`	`false`	bool	Normalize error responses across providers
`TOKENPAK_BUDGET_CONTROLLER`	`false`	bool	Enforce per-request token budget limits
`TOKENPAK_REQUEST_LOGGER`	`false`	bool	Structured request/response logging
`TOKENPAK_SALIENCE_ROUTER`	`false`	bool	Content-type-aware extraction before compaction

Feature Flags (Tier 2B — default OFF)¶

Variable	Default	Type	Description
`TOKENPAK_CACHE_REGISTRY`	`false`	bool	Unified stable/volatile cache registry
`TOKENPAK_RETRIEVAL_WATCHDOG`	`false`	bool	Monitor retrieval latency and quality
`TOKENPAK_FAILURE_MEMORY`	`false`	bool	Track and learn from upstream failures
`TOKENPAK_FIDELITY_TIERS`	`false`	bool	Tiered response fidelity based on cost/quality tradeoffs

Feature Flags (Phase 3 — default OFF)¶

Variable	Default	Type	Description
`TOKENPAK_SESSION_CAPSULES`	`false`	bool	Session-scoped capsule summarization
`TOKENPAK_PRECONDITION_GATES`	`false`	bool	Guard rails before expensive operations
`TOKENPAK_QUERY_REWRITER`	`false`	bool	Automatic query rewriting for better retrieval
`TOKENPAK_STABILITY_SCORER`	`false`	bool	Score and rank response stability

Other Features¶

Variable	Default	Type	Description
`TOKENPAK_CAPSULE_BUILDER`	`false`	bool	Enable capsule builder compression stage
`TOKENPAK_CAPSULE_MIN_CHARS`	`400`	int	Minimum block chars for capsule compression
`TOKENPAK_CAPSULE_HOT_WINDOW`	`2`	int	Trailing messages excluded from capsule compression
`TOKENPAK_SKELETON_ENABLED`	`true`	bool	Strip function bodies from code blocks before injection
`TOKENPAK_SHADOW_ENABLED`	`true`	bool	Shadow reader for passive observation
`TOKENPAK_TERM_RESOLVER_ENABLED`	`false`	bool	Term-card resolver (inline term definitions)
`TOKENPAK_TERM_RESOLVER_TOP_K`	`3`	int	Max term cards to inject per request
`TOKENPAK_TERM_RESOLVER_MAX_BYTES`	`200`	int	Max bytes per term card
`TOKENPAK_CHAT_FOOTER`	`false`	bool	Append cost/token footer to chat responses
`TOKENPAK_HTTP100_KEEPALIVE`	`false`	bool	Send HTTP 100 Continue for keepalive

WebSocket¶

Variable	Default	Type	Description
`TOKENPAK_WS_PORT`	`8767`	int	WebSocket proxy port
`TOKENPAK_WS_MAX_CONNECTIONS`	`50`	int	Max concurrent WebSocket connections

Generated from proxy.py and packages/core/proxy.py on 2026-03-28. To add a variable, use _cfg("config.key", default, "TOKENPAK_VAR_NAME", type) in proxy.py.