Recipe: Multi-Provider Fallback¶
Status: Conceptual. This recipe describes a failover pattern. The declarative
fallback_toconfig keys and the failover log output shown below are illustrative — they are not a validated config surface of the current TokenPak release, and the proxy does not emit those messages inline. Treat this as a design sketch, not a copy-paste runbook. Confirm any CLI command againsttokenpak --helpbefore relying on it.
What this solves: The pattern of routing requests to a backup provider when your primary provider experiences an outage or rate limit.
Prerequisites¶
- TokenPak installed:
pip install tokenpak - Valid API keys for the providers you intend to use
tokenpakCLI available in your shell (tokenpak --help)
The pattern (illustrative config)¶
The idea is to declare a primary model and a fallback target so that a failed request retries against a different provider. The YAML below is a conceptual illustration of how such a config might read — it is not the validated schema of the shipped proxy:
# ILLUSTRATIVE ONLY — not a validated TokenPak config schema
providers:
openai:
type: openai
api_key: ${OPENAI_API_KEY}
anthropic:
type: anthropic
api_key: ${ANTHROPIC_API_KEY}
models:
gpt-4:
provider: openai
# Conceptual: try OpenAI first, then Anthropic
fallback_to: claude-3-sonnet
claude-3-sonnet:
provider: anthropic
What's real today¶
- Start the proxy with
tokenpak serve(defaulthttp://127.0.0.1:8766). - Validate a proxy config file with
tokenpak config-check <file.json>. The shipped proxy config surface is a JSON file; for the server block,config-checkrecommendsserver: { port: 8766, host: '127.0.0.1' }. The elaboratefallback_tomodel graph above is not part of that validated surface. - TokenPak's proxy is a byte-preserving passthrough — it forwards request and response bodies verbatim. It does not inject failover status messages or extra fields into the response body.
A request against the proxy looks like:
tokenpak serve # listens on http://127.0.0.1:8766
# In another terminal:
curl -X POST http://127.0.0.1:8766/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-3-5-sonnet-20241022",
"max_tokens": 64,
"messages": [{"role": "user", "content": "Say OK"}]
}'
The response body is whatever the upstream provider returns, passed through unchanged.
Designing a fallback strategy¶
If you implement failover (in your application or via your own orchestration in front of the proxy), the principles below apply regardless of how the routing is configured:
- Chain across different providers, not just different models of the same provider — an outage often takes out a whole provider.
- Avoid fallback loops — never let a chain point back to a model already tried.
- Pre-check every key in the chain so the fallback isn't itself unauthenticated.
- Check the fallback's rate limits — a fallback with a much lower limit can become the new bottleneck during an incident.