Skip to content

Error Handling & Troubleshooting

TokenPak provides normalized error handling across all providers, automatic retries, and fallback chains.


Common Errors & Solutions

1. Connection Refused (Proxy Not Running)

Error Message:

ConnectionRefusedError: [Errno 111] Connection refused
Failed to connect to http://127.0.0.1:8000

Cause: The TokenPak proxy server is not running.

Solution:

# Start the proxy
tokenpak serve

# (in another terminal)
python your_script.py

Prevention: Keep the proxy running in a background process or systemd service.


2. Authentication Failed (Invalid API Key)

Error Message:

AuthenticationError: Invalid API key for provider: anthropic
Check your ANTHROPIC_API_KEY environment variable

Cause: Missing or incorrect API key.

Solution:

# Check if key is set
echo $ANTHROPIC_API_KEY

# Set the key
export ANTHROPIC_API_KEY="sk-ant-..."

# Restart the proxy
tokenpak serve

Prevention: - Use a .env file (see Installation) - Check key format (should start with sk-ant-, sk-, or AIza-) - Rotate expired keys immediately


3. Rate Limit Exceeded

Error Message:

RateLimitError: Rate limit exceeded (429)
Retry-After: 60

Cause: Too many requests to the provider in a short time.

Solution (Automatic): TokenPak automatically retries with exponential backoff:

Attempt 1: Wait 1 second, retry
Attempt 2: Wait 2 seconds, retry
Attempt 3: Wait 4 seconds, retry
Attempt 4: Wait 8 seconds, retry
(Circuit breaker opens, switch to fallback provider)

Solution (Manual):

from tokenpak import Client, RateLimitError
import time

client = Client(api_key="...", model="claude-opus-4-6")

try:
    response = client.messages.create(...)
except RateLimitError as e:
    wait_seconds = int(e.headers.get("Retry-After", 60))
    print(f"Rate limited. Waiting {wait_seconds}s...")
    time.sleep(wait_seconds)
    # Retry
    response = client.messages.create(...)

Prevention: - Implement request batching (fewer, larger requests) - Use fallback chains for load balancing - Monitor your request frequency


4. Model Not Found

Error Message:

ModelNotFoundError: Model 'gpt-7' not found in provider: openai
Available models: gpt-4o, gpt-4-turbo, gpt-3.5-turbo

Cause: Using a model name that the provider doesn't support.

Solution:

# Check available models for your provider
client = Client(api_key="...", model="gpt-4o")

# Use a valid model name
response = client.messages.create(
    model="gpt-4o",  # Valid
    messages=[...]
)

Common Model Names:

Provider Models
Anthropic claude-opus-4-6, claude-sonnet-4-6, claude-haiku-3-5
OpenAI gpt-4o, gpt-4-turbo, gpt-3.5-turbo
Google gemini-pro, gemini-pro-vision, gemini-ultra

Prevention: Hardcode model names; don't accept user input directly.


5. Provider Timeout

Error Message:

TimeoutError: Request to anthropic timed out after 300 seconds

Cause: Provider took too long to respond.

Solution (TokenPak Automatic): Switches to fallback provider if primary times out.

Solution (Manual):

client = Client(
    api_key="...",
    model="claude-opus-4-6",
    timeout=60  # 60 second timeout
)

try:
    response = client.messages.create(...)
except TimeoutError:
    # Try fallback manually
    client.model = "gemini-pro"
    response = client.messages.create(...)

Prevention: - Use fallback chains - Set reasonable timeouts - Monitor provider status


6. Token Limit Exceeded

Error Message:

TokenLimitError: Message exceeds max_tokens limit (4096 > 4096)

Cause: Request too large for the model.

Solution:

# Option 1: Reduce message size
short_context = "Summary of relevant context only..."

# Option 2: Use compression (automatic)
client = Client(
    api_key="...",
    model="claude-opus-4-6",
    compression=True  # Auto-compress context
)

# Option 3: Split into multiple requests
# (batch processing)

Prevention: - Use count_tokens() before making requests - Implement compression (automatic in FREE) - Use document injection selectively


7. Invalid Configuration

Error Message:

ConfigError: Invalid config.yaml syntax at line 5:
  compression.enabled must be a boolean, got 'yes'

Cause: Malformed YAML or invalid option.

Solution:

# Wrong
compression:
  enabled: yes  # ❌ Should be true/false

# Right
compression:
  enabled: true  # ✅

Validation:

# Validate config before starting
tokenpak validate --config config.yaml

# Shows all errors

Prevention: - Use YAML validator: https://yamllint.com/ - Check indentation (spaces, not tabs) - Refer to Installation guide for examples


Fallback Chains & Circuit Breaker

TokenPak automatically switches providers when the primary fails.

How It Works

provider: anthropic
fallback:
  - google      # Try if Anthropic fails
  - openai      # Try if Google fails

Request flow:

1. Try Anthropic
   ├─ Success? ✅ Return response
   ├─ Timeout? → Try Google
   ├─ Rate limit? → Wait then retry
   └─ Permanent error? → Try Google

2. Try Google
   ├─ Success? ✅ Return response
   └─ Fail? → Try OpenAI

3. Try OpenAI
   ├─ Success? ✅ Return response
   └─ Fail? → Return error to client

Circuit Breaker

When a provider fails repeatedly, TokenPak opens the circuit breaker to prevent cascading failures:

State: CLOSED (normal operation)
  └─ 3 failures in 60 seconds → OPEN

State: OPEN (provider is down)
  └─ Skip to fallback provider
  └─ After 300 seconds → HALF_OPEN

State: HALF_OPEN (testing recovery)
  └─ Try 1 request
  ├─ Success? → CLOSED
  └─ Fail? → OPEN (restart 300s timer)

Configuration

fallback:
  - anthropic
  - google
  - openai

circuit_breaker:
  failure_threshold: 3      # Open after 3 failures
  recovery_timeout: 300     # Reset after 5 minutes
  half_open_requests: 1     # Test 1 request in half-open

Monitoring & Debugging

Enable Debug Logging

# Detailed logs
TOKENPAK_LOG_LEVEL=DEBUG tokenpak serve

# Write to file
tokenpak serve --log-file /tmp/tokenpak.log

Check Proxy Status

# Health check endpoint
curl http://127.0.0.1:8000/health

# Response:
# {
#   "status": "healthy",
#   "providers": {
#     "anthropic": "ok",
#     "google": "ok",
#     "openai": "degraded"
#   }
# }

View Request Log

# Last 100 requests
curl http://127.0.0.1:8000/logs?limit=100

# Requests to Anthropic only
curl 'http://127.0.0.1:8000/logs?provider=anthropic'

# Requests with errors
curl 'http://127.0.0.1:8000/logs?status=error'

Test Provider Connectivity

from tokenpak import Client

client = Client(api_key="...", model="claude-opus-4-6")

# Quick test
try:
    response = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=10,
        messages=[{"role": "user", "content": "test"}]
    )
    print("✅ Connected to Anthropic")
except Exception as e:
    print(f"❌ Connection failed: {e}")

Error Types Reference

Client Errors (4xx)

Error Code Cause Solution
AuthenticationError 401 Invalid API key Check API key in .env
PermissionError 403 Key lacks permissions Regenerate API key
NotFoundError 404 Model not found Check model name
RateLimitError 429 Too many requests Use fallback chain
TokenLimitError 413 Message too large Compress or split
ValidationError 400 Invalid request format Check request structure

Server Errors (5xx)

Error Code Cause Solution
ServerError 500 Provider internal error Retry with fallback
ServiceUnavailableError 503 Provider down Use fallback chain
GatewayError 502 Network issue Check connection
TimeoutError 504 Request took too long Increase timeout

Network Errors

Error Cause Solution
ConnectionRefusedError Proxy not running Start tokenpak serve
ConnectionError Network unreachable Check internet connection
SSLError Certificate validation failed Check CA certificates

Best Practices

1. Always Use Fallback Chains

provider: anthropic
fallback:
  - google
  - openai

2. Wrap Requests in Try-Catch

try:
    response = client.messages.create(...)
except RateLimitError:
    # Handle rate limit
    pass
except AuthenticationError:
    # Handle auth error
    pass
except Exception as e:
    # Log unexpected errors
    logger.error(f"Unexpected: {e}")

3. Implement Exponential Backoff

TokenPak does this automatically, but for custom retries:

import time

def call_with_backoff(fn, max_attempts=3):
    for attempt in range(max_attempts):
        try:
            return fn()
        except RateLimitError:
            wait = 2 ** attempt  # 1, 2, 4 seconds
            print(f"Attempt {attempt + 1} failed. Waiting {wait}s...")
            time.sleep(wait)
    raise Exception("All attempts failed")

4. Monitor Token Usage

# Before making request
tokens = client.count_tokens(
    model="claude-opus-4-6",
    messages=messages
)

if tokens > 10000:
    print(f"Warning: {tokens} tokens. Consider compression.")

5. Set Timeouts

client = Client(
    api_key="...",
    timeout=30,  # 30 second timeout
    model="claude-opus-4-6"
)

Getting Help

  • Question? Check this guide or FAQ
  • Bug? Open an issue on GitHub
  • Stuck? Email support or check Discord

Next Steps