Error Handling & Troubleshooting¶

TokenPak provides normalized error handling across all providers, automatic retries, and fallback chains.

Common Errors & Solutions¶

1. Connection Refused (Proxy Not Running)¶

Error Message:

ConnectionRefusedError: [Errno 111] Connection refused
Failed to connect to http://127.0.0.1:8000

Cause: The TokenPak proxy server is not running.

Solution:

# Start the proxy
tokenpak serve

# (in another terminal)
python your_script.py

Prevention: Keep the proxy running in a background process or systemd service.

2. Authentication Failed (Invalid API Key)¶

Error Message:

AuthenticationError: Invalid API key for provider: anthropic
Check your ANTHROPIC_API_KEY environment variable

Cause: Missing or incorrect API key.

Solution:

# Check if key is set
echo $ANTHROPIC_API_KEY

# Set the key
export ANTHROPIC_API_KEY="sk-ant-..."

# Restart the proxy
tokenpak serve

Prevention: - Use a .env file (see Installation) - Check key format (should start with sk-ant-, sk-, or AIza-) - Rotate expired keys immediately

3. Rate Limit Exceeded¶

Error Message:

RateLimitError: Rate limit exceeded (429)
Retry-After: 60

Cause: Too many requests to the provider in a short time.

Solution (Automatic): TokenPak automatically retries with exponential backoff:

Attempt 1: Wait 1 second, retry
Attempt 2: Wait 2 seconds, retry
Attempt 3: Wait 4 seconds, retry
Attempt 4: Wait 8 seconds, retry
(Circuit breaker opens, switch to fallback provider)

Solution (Manual):

from tokenpak import Client, RateLimitError
import time

client = Client(api_key="...", model="claude-opus-4-6")

try:
    response = client.messages.create(...)
except RateLimitError as e:
    wait_seconds = int(e.headers.get("Retry-After", 60))
    print(f"Rate limited. Waiting {wait_seconds}s...")
    time.sleep(wait_seconds)
    # Retry
    response = client.messages.create(...)

Prevention: - Implement request batching (fewer, larger requests) - Use fallback chains for load balancing - Monitor your request frequency

4. Model Not Found¶

Error Message:

ModelNotFoundError: Model 'gpt-7' not found in provider: openai
Available models: gpt-4o, gpt-4-turbo, gpt-3.5-turbo

Cause: Using a model name that the provider doesn't support.

Solution:

# Check available models for your provider
client = Client(api_key="...", model="gpt-4o")

# Use a valid model name
response = client.messages.create(
    model="gpt-4o",  # Valid
    messages=[...]
)

Common Model Names:

Provider	Models
Anthropic	`claude-opus-4-6`, `claude-sonnet-4-6`, `claude-haiku-3-5`
OpenAI	`gpt-4o`, `gpt-4-turbo`, `gpt-3.5-turbo`
Google	`gemini-pro`, `gemini-pro-vision`, `gemini-ultra`

Prevention: Hardcode model names; don't accept user input directly.

5. Provider Timeout¶

Error Message:

TimeoutError: Request to anthropic timed out after 300 seconds

Cause: Provider took too long to respond.

Solution (TokenPak Automatic): Switches to fallback provider if primary times out.

Solution (Manual):

client = Client(
    api_key="...",
    model="claude-opus-4-6",
    timeout=60  # 60 second timeout
)

try:
    response = client.messages.create(...)
except TimeoutError:
    # Try fallback manually
    client.model = "gemini-pro"
    response = client.messages.create(...)

Prevention: - Use fallback chains - Set reasonable timeouts - Monitor provider status

6. Token Limit Exceeded¶

Error Message:

TokenLimitError: Message exceeds max_tokens limit (4096 > 4096)

Cause: Request too large for the model.

Solution:

# Option 1: Reduce message size
short_context = "Summary of relevant context only..."

# Option 2: Use compression (automatic)
client = Client(
    api_key="...",
    model="claude-opus-4-6",
    compression=True  # Auto-compress context
)

# Option 3: Split into multiple requests
# (batch processing)

Prevention: - Use count_tokens() before making requests - Implement compression (automatic in FREE) - Use document injection selectively

7. Invalid Configuration¶

Error Message:

ConfigError: Invalid config.yaml syntax at line 5:
  compression.enabled must be a boolean, got 'yes'

Cause: Malformed YAML or invalid option.

Solution:

# Wrong
compression:
  enabled: yes  # ❌ Should be true/false

# Right
compression:
  enabled: true  # ✅

Validation:

# Validate config before starting
tokenpak validate --config config.yaml

# Shows all errors

Prevention: - Use YAML validator: https://yamllint.com/ - Check indentation (spaces, not tabs) - Refer to Installation guide for examples

Fallback Chains & Circuit Breaker¶

TokenPak automatically switches providers when the primary fails.

How It Works¶

provider: anthropic
fallback:
  - google      # Try if Anthropic fails
  - openai      # Try if Google fails

Request flow:

1. Try Anthropic
   ├─ Success? ✅ Return response
   ├─ Timeout? → Try Google
   ├─ Rate limit? → Wait then retry
   └─ Permanent error? → Try Google

2. Try Google
   ├─ Success? ✅ Return response
   └─ Fail? → Try OpenAI

3. Try OpenAI
   ├─ Success? ✅ Return response
   └─ Fail? → Return error to client

Circuit Breaker¶

When a provider fails repeatedly, TokenPak opens the circuit breaker to prevent cascading failures:

State: CLOSED (normal operation)
  └─ 3 failures in 60 seconds → OPEN

State: OPEN (provider is down)
  └─ Skip to fallback provider
  └─ After 300 seconds → HALF_OPEN

State: HALF_OPEN (testing recovery)
  └─ Try 1 request
  ├─ Success? → CLOSED
  └─ Fail? → OPEN (restart 300s timer)

Configuration¶

fallback:
  - anthropic
  - google
  - openai

circuit_breaker:
  failure_threshold: 3      # Open after 3 failures
  recovery_timeout: 300     # Reset after 5 minutes
  half_open_requests: 1     # Test 1 request in half-open

Monitoring & Debugging¶

Enable Debug Logging¶

# Detailed logs
TOKENPAK_LOG_LEVEL=DEBUG tokenpak serve

# Write to file
tokenpak serve --log-file /tmp/tokenpak.log

Check Proxy Status¶

# Health check endpoint
curl http://127.0.0.1:8000/health

# Response:
# {
#   "status": "healthy",
#   "providers": {
#     "anthropic": "ok",
#     "google": "ok",
#     "openai": "degraded"
#   }
# }

View Request Log¶

# Last 100 requests
curl http://127.0.0.1:8000/logs?limit=100

# Requests to Anthropic only
curl 'http://127.0.0.1:8000/logs?provider=anthropic'

# Requests with errors
curl 'http://127.0.0.1:8000/logs?status=error'

Test Provider Connectivity¶

from tokenpak import Client

client = Client(api_key="...", model="claude-opus-4-6")

# Quick test
try:
    response = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=10,
        messages=[{"role": "user", "content": "test"}]
    )
    print("✅ Connected to Anthropic")
except Exception as e:
    print(f"❌ Connection failed: {e}")

Error Types Reference¶

Client Errors (4xx)¶

Error	Code	Cause	Solution
`AuthenticationError`	401	Invalid API key	Check API key in `.env`
`PermissionError`	403	Key lacks permissions	Regenerate API key
`NotFoundError`	404	Model not found	Check model name
`RateLimitError`	429	Too many requests	Use fallback chain
`TokenLimitError`	413	Message too large	Compress or split
`ValidationError`	400	Invalid request format	Check request structure

Server Errors (5xx)¶

Error	Code	Cause	Solution
`ServerError`	500	Provider internal error	Retry with fallback
`ServiceUnavailableError`	503	Provider down	Use fallback chain
`GatewayError`	502	Network issue	Check connection
`TimeoutError`	504	Request took too long	Increase timeout

Network Errors¶

Error	Cause	Solution
`ConnectionRefusedError`	Proxy not running	Start `tokenpak serve`
`ConnectionError`	Network unreachable	Check internet connection
`SSLError`	Certificate validation failed	Check CA certificates

Best Practices¶

1. Always Use Fallback Chains¶

provider: anthropic
fallback:
  - google
  - openai

2. Wrap Requests in Try-Catch¶

try:
    response = client.messages.create(...)
except RateLimitError:
    # Handle rate limit
    pass
except AuthenticationError:
    # Handle auth error
    pass
except Exception as e:
    # Log unexpected errors
    logger.error(f"Unexpected: {e}")

3. Implement Exponential Backoff¶

TokenPak does this automatically, but for custom retries:

import time

def call_with_backoff(fn, max_attempts=3):
    for attempt in range(max_attempts):
        try:
            return fn()
        except RateLimitError:
            wait = 2 ** attempt  # 1, 2, 4 seconds
            print(f"Attempt {attempt + 1} failed. Waiting {wait}s...")
            time.sleep(wait)
    raise Exception("All attempts failed")

4. Monitor Token Usage¶

# Before making request
tokens = client.count_tokens(
    model="claude-opus-4-6",
    messages=messages
)

if tokens > 10000:
    print(f"Warning: {tokens} tokens. Consider compression.")

5. Set Timeouts¶

client = Client(
    api_key="...",
    timeout=30,  # 30 second timeout
    model="claude-opus-4-6"
)

Getting Help¶

Question? Check this guide or FAQ
Bug? Open an issue on GitHub
Stuck? Email support or check Discord

Next Steps¶

Monitoring: See Observability Guide
Performance: Check Feature Matrix for optimization tips
Adapters: See Adapter Reference for provider-specific notes