Recipe: Budget Caps & Spend Alerts¶
Status: Conceptual. This recipe describes a spend-guard pattern. The declarative
budget:config block, per-user limits, the inlinecost_cents/budget_exceededresponse fields, and thesync-pricing/reconcile-spendcommands shown below are illustrative — they are not part of the validated config surface or shipped CLI of the current TokenPak release. The proxy is a byte-preserving passthrough and does not inject cost or error fields into the response body. Treat this as a design sketch, not a copy-paste runbook. Confirm any CLI command againsttokenpak --help.
What this solves: The pattern of capping daily/monthly spend and reacting (alert, reject, or downgrade) as a budget is approached or exceeded.
Prerequisites¶
- TokenPak installed (
tokenpak --help) - Valid API keys for your providers
- A monitoring sink (logs, webhook, or metrics store) if you want alerts
The pattern (illustrative config)¶
The idea is to track estimated spend and take an action at thresholds. The YAML below is a conceptual illustration only — it is not the validated TokenPak proxy config schema:
# ILLUSTRATIVE ONLY — not a validated TokenPak config schema
budget:
enabled: true
daily_limit_cents: 1000 # $10.00/day
monthly_limit_cents: 20000 # $200.00/month
alert_threshold_pct: 80
alert_webhook: https://monitoring.example.com/budget-alert
reject_on_exceed: true
# Optional graceful degradation: route to a cheaper model instead
fallback_model_on_exceed: gpt-3.5-turbo
What's real today¶
- Start the proxy with
tokenpak serve(defaulthttp://127.0.0.1:8766). - TokenPak records usage out-of-band; inspect it with
tokenpak status. Cost is not returned inline in the response body — the proxy forwards the upstream response verbatim. - Validate a proxy config file with
tokenpak config-check <file.json>. The declarativebudget:graph above is not part of that validated surface.
A request against the proxy returns the upstream provider's response unchanged:
tokenpak serve # http://127.0.0.1:8766
curl -X POST http://127.0.0.1:8766/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-d '{"model": "claude-3-5-sonnet-20241022", "max_tokens": 64,
"messages": [{"role": "user", "content": "Write 100 words"}]}'
To see spend after running requests, use tokenpak status (recorded server-side), rather than reading a cost_cents field from the response.
Designing a budget guard¶
If you implement spend control (in your application or your own orchestration), these principles apply:
- Keep cost estimates current. Provider pricing changes; a stale hardcoded rate drifts from reality. Refresh pricing from an authoritative source rather than hardcoding it indefinitely.
- Make alerting reliable. A single webhook with no retry can fail silently — use retries or a secondary channel.
- Distribute per-user limits so a single user can't consume the whole budget.
- Leave headroom for in-flight requests — block somewhat below the hard cap so a request already underway can complete.
- Reconcile estimates against real invoices periodically — estimate-based tracking drifts from actual billing over time.