Recipe: Budget Caps & Spend Alerts¶
What this solves: Set a daily or monthly spending limit and automatically reject requests when approaching or exceeding the cap.
Prerequisites¶
- TokenPak installed with
budgetplugin enabled - A billing API key or webhook endpoint to track spend
- Access to logs or monitoring system for alerts
- Valid API keys for your providers
Config Snippet¶
# config.yaml
budget:
enabled: true
# Daily cap: $10.00 USD
daily_limit_cents: 1000
# Monthly cap: $200.00 USD
monthly_limit_cents: 20000
# Alert when spend reaches 80% of cap
alert_threshold_pct: 80
alert_webhook: https://monitoring.example.com/budget-alert
# Action when limit exceeded
reject_on_exceed: true
# Optional: graceful degradation (route to cheaper model instead)
fallback_model_on_exceed: gpt-3.5-turbo
# Track spend per user (optional)
per_user_limits:
enabled: true
user_limits:
user-a:
daily_limit_cents: 300
user-b:
daily_limit_cents: 500
providers:
openai:
type: openai
api_key: ${OPENAI_API_KEY}
models:
gpt-4:
provider: openai
estimated_input_cost_per_1k: 3 # cents
estimated_output_cost_per_1k: 6 # cents
gpt-3.5-turbo:
provider: openai
estimated_input_cost_per_1k: 0.5
estimated_output_cost_per_1k: 1.5
Test & Verify¶
Step 1: Validate the config:
tokenpak validate-config config.yaml
# Expected output:
# ✓ Config valid
# ✓ Budget tracking enabled (daily: $10.00, monthly: $200.00)
# ✓ Per-user limits: user-a ($3.00/day), user-b ($5.00/day)
Step 2: Start the proxy and make requests:
export OPENAI_API_KEY="sk-real-key"
tokenpak proxy --config config.yaml
# In another terminal, make 5 requests to consume budget
for i in {1..5}; do
curl -X POST http://localhost:8000/v1/messages \
-H "Authorization: Bearer user-a" \
-d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Write 100 words of code"}]}' \
-s | jq '.cost_cents'
done
# Expected output (cumulative):
# 180
# 360
# 540
# 720
# 900 <- This triggers 80% alert
Step 3: Verify alert was sent:
# Check logs for webhook call
tail -f /var/log/tokenpak/alerts.log | grep "budget.*threshold"
# Expected output:
# 2026-03-25T09:45:00Z ALERT budget_threshold_exceeded user=user-a spent=$9.00 daily_limit=$3.00 pct=300%
Step 4: Exceed the limit and verify rejection:
# Make one more request (would exceed daily limit)
curl -X POST http://localhost:8000/v1/messages \
-H "Authorization: Bearer user-a" \
-d '{"model": "gpt-4", "messages": [{"role": "user", "content": "One more request"}]}' \
-s | jq '.error'
# Expected output:
# {
# "code": "budget_exceeded",
# "message": "User user-a has reached daily budget limit ($3.00). Rejecting request.",
# "daily_spent": "$9.00",
# "daily_limit": "$3.00"
# }
What Just Happened¶
TokenPak maintained a running cost tally for each request:
- Request arrives with user
user-arequestinggpt-4 - Cost estimation based on model pricing and estimated token count
- Spend check against
user-a's daily limit ($3.00) - When 80% reached ($2.40), webhook notification sent to your monitoring system
- When limit exceeded ($3.00+), subsequent requests rejected with
budget_exceedederror
Users are aware of budget constraints without application code changes — TokenPak enforces them transparently.
Common Pitfalls¶
Pitfall 1: Cost estimates are inaccurate
- ❌ Wrong: Hardcoded fixed costs that don't match actual provider pricing
- ✅ Right: Query provider APIs hourly for latest pricing: tokenpak sync-pricing --provider openai
Pitfall 2: Alert webhook is unreliable
- ❌ Wrong: Single webhook with no retry logic, silent failures
- ✅ Right: Configure multiple webhooks or fallback to email: alert_webhook: [https://..., mailto://ops@...]
Pitfall 3: Per-user limits are too permissive
- ❌ Wrong: Set limits equal to total budget (allows one user to consume everything)
- ✅ Right: Distribute: total_budget / num_users * 1.2 with monitoring per user
Pitfall 4: Forgetting to account for overage
- ❌ Wrong: Set monthly limit to $100, but requests are blocking at $100 exactly
- ✅ Right: Set limit to $100 - (estimated_max_single_request * 2) to allow completion
Pitfall 5: Cost tracking falls out of sync
- ❌ Wrong: Estimate-based tracking drifts from actual billing
- ✅ Right: Hourly reconciliation with actual provider invoices: tokenpak reconcile-spend --provider openai