Recipe: Local Development with Mock Provider¶
What this solves: Use a mock/stub provider in development to avoid API costs and rate limits while testing, then switch to real providers in production with zero code changes.
Prerequisites¶
- TokenPak installed
- Python or local environment for testing
- Understanding of mock responses (deterministic, predictable)
Config Snippet¶
# config.yaml (local development)
providers:
# Mock provider: responds instantly with fake data
mock:
type: mock
# Mock responses follow patterns:
# - latency: fake delay (simulate real provider)
# - deterministic: same input = same output
latency_ms: 200 # Simulate 200ms API latency
# Canned responses (by model)
responses:
gpt-4:
default: "Mock GPT-4 response: [mock output for testing]"
# Override by keyword
patterns:
debug: "Mock response: Debugged your code successfully"
refactor: "Mock response: Code refactored for clarity"
claude-3-sonnet:
default: "Mock Claude response: [test output]"
patterns:
explain: "Mock response: Explained the concept clearly"
# Real providers: configured but not used in dev
openai:
type: openai
api_key: ${OPENAI_API_KEY}
enabled: false # Disabled in dev
anthropic:
type: anthropic
api_key: ${ANTHROPIC_API_KEY}
enabled: false
models:
# Development: use mock
gpt-4:
provider: mock
fallback_provider: mock # Never fall back to real API in dev
gpt-3.5-turbo:
provider: mock
fallback_provider: mock
claude-3-sonnet:
provider: mock
fallback_provider: mock
# Real providers commented out for dev
# gpt-4-prod: { provider: openai }
# claude-3-sonnet-prod: { provider: anthropic }
Production config (config.prod.yaml):
providers:
openai:
type: openai
api_key: ${OPENAI_API_KEY}
enabled: true
anthropic:
type: anthropic
api_key: ${ANTHROPIC_API_KEY}
enabled: true
# Mock disabled in production
mock:
type: mock
enabled: false
models:
gpt-4: { provider: openai, fallback_provider: anthropic }
gpt-3.5-turbo: { provider: openai, fallback_provider: anthropic }
claude-3-sonnet: { provider: anthropic, fallback_provider: openai }
Test & Verify¶
Step 1: Validate dev config:
tokenpak validate-config config.yaml
# Expected output:
# ✓ Config valid
# ✓ Providers: mock (enabled)
# ✓ Real providers: openai (disabled), anthropic (disabled)
# ✓ All models route to mock
Step 2: Start proxy in dev mode:
tokenpak proxy --config config.yaml
# Should start instantly (no API key validation)
Step 3: Make a request (mock response, instant):
time curl -X POST http://localhost:8000/v1/messages \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Debug my code"}]
}' -s | jq '.content'
# Expected output:
# "Mock response: Debugged your code successfully"
# real 0.2s (includes mock latency, very fast)
Step 4: Make many requests without API costs:
# Simulate high-volume testing
for i in {1..100}; do
curl -X POST http://localhost:8000/v1/messages \
-d '{"model": "gpt-4", "messages": [{"role": "user", "content": "request $i"}]}' \
-s > /dev/null
done
echo "Made 100 requests, $0 cost!"
Step 5: Verify no real API calls (check logs):
tokenpak logs --provider openai
# Expected output: EMPTY (no real API calls)
tokenpak logs --provider mock
# Expected output: 100 calls to mock provider
Step 6: Switch to production config:
# Stop dev proxy
pkill -f "tokenpak proxy --config config.yaml"
# Start production proxy
tokenpak proxy --config config.prod.yaml
# Now real API calls will be made
Step 7: Verify switch worked:
curl -X POST http://localhost:8000/v1/messages \
-d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Real request"}]}' \
-s | jq '.cost_cents'
# Expected output: non-zero cost (real API call)
Integration Example (Python)¶
# app.py - Same code works in dev or prod
import requests
import os
def get_ai_response(prompt):
response = requests.post(
'http://localhost:8000/v1/messages',
json={
'model': 'gpt-4', # Uses mock in dev, real in prod
'messages': [{'role': 'user', 'content': prompt}]
}
)
return response.json()['content']
# Run in dev: fast, free, deterministic
# Run in prod: slow, costs money, real answers
if __name__ == '__main__':
mode = os.getenv('ENV', 'dev')
print(f"Running in {mode} mode")
print(get_ai_response("Hello"))
Run in dev:
ENV=dev python app.py
# Output: Mock response: [test output]
# No API calls, instant
Run in prod:
ENV=prod python app.py
# Output: Real GPT-4 response
# Costs money, realistic
What Just Happened¶
TokenPak routed your request to the mock provider in development:
- Request arrives with model
gpt-4 - Provider lookup finds
gpt-4 → mock - Mock provider returns canned response instantly
- Client receives response without any real API call
In production, the same code routes to real providers — no application changes needed.
Common Pitfalls¶
Pitfall 1: Mock responses are too different from reality - ❌ Wrong: Mock always says "success", real API has variability - ✅ Right: Make mocks realistic: include error cases, vary response lengths
Pitfall 2: Forgetting to switch configs
- ❌ Wrong: Deploy to production with config.yaml (dev mocks enabled)
- ✅ Right: CI/CD enforces config.prod.yaml on production deployments
Pitfall 3: Mock latency is too low
- ❌ Wrong: latency_ms: 0 (tests pass in dev, timeout in prod)
- ✅ Right: latency_ms: 200 - 500 (realistic, catches slow code paths)
Pitfall 4: Real providers still enabled in dev
- ❌ Wrong: enabled: true for OpenAI in dev, can accidentally burn budget
- ✅ Right: Explicitly enabled: false for real providers in dev config
Pitfall 5: Mock responses are too static
- ❌ Wrong: Every request returns identical response
- ✅ Right: Vary by prompt keyword: patterns: {debug: "...", refactor: "..."}
Pitfall 6: No way to test fallback logic
- ❌ Wrong: Can't test "what if primary provider fails?" in dev
- ✅ Right: Add mock provider with failure_rate: 0.2 to test fallbacks