Recipe: Cost Monitoring & Observability¶
Status: Conceptual. This recipe describes a cost-observability pattern. The declarative
metrics:/prometheus:/custom_metrics:config block, a built-in Prometheus metrics endpoint, and the specifictokenpak_*metric values shown below are illustrative — they are not part of the validated config surface of the current TokenPak release, and the proxy does not expose a built-in Prometheus exporter (there is no:8001/metricsendpoint). The real, shipped way to see usage and cost istokenpak status. Confirm any CLI command againsttokenpak --help.
What this solves: The pattern of getting visibility into request volume, token usage, and cost over time.
Prerequisites¶
- TokenPak installed (
tokenpak --help) - API keys for your providers
- (For the conceptual dashboard pattern) a metrics store such as Prometheus + Grafana
What's real today¶
The shipped proxy listens on http://127.0.0.1:8766 and records usage out-of-band. To see it:
tokenpak serve # proxy on http://127.0.0.1:8766
# After sending some traffic through the proxy:
tokenpak status # recorded request, token, and cost summary
TokenPak's proxy is a byte-preserving passthrough — it does not inject cost fields into the response body, and it does not run a separate Prometheus metrics server. Usage/cost accounting is recorded server-side and surfaced via the CLI (tokenpak status), not scraped from a :8001/metrics endpoint.
The pattern (illustrative — external metrics pipeline)¶
If you want dashboards in Prometheus/Grafana, the conceptual pattern is to export TokenPak's recorded usage into a metrics store with your own shim (for example, a small exporter that reads TokenPak's recorded usage and republishes it as Prometheus metrics). The config and metric names below are illustrative only — they do not describe a built-in TokenPak exporter:
# ILLUSTRATIVE ONLY — not a validated TokenPak config schema,
# and not a built-in TokenPak metrics exporter.
metrics:
enabled: true
prometheus:
enabled: true
port: 9464 # your own exporter's port (example)
track_by: [user_id, model, provider, status_code]
# Example Prometheus scrape config pointing at YOUR exporter (illustrative):
scrape_configs:
- job_name: tokenpak
static_configs:
- targets: ['localhost:9464'] # your exporter, not a TokenPak built-in
Example metric series your exporter might publish (names and values are illustrative, not emitted by TokenPak itself):
# ILLUSTRATIVE metric shapes — define and emit these yourself
tokenpak_requests_total{model="...",provider="...",status="200"}
tokenpak_cost_usd_total{model="...",provider="..."}
tokenpak_tokens_sent_total{model="..."}
Designing cost observability¶
Whether you use tokenpak status directly or build an external pipeline, these principles apply:
- Watch cardinality. Per-user-id series can explode; aggregate by tier or model where you can.
- Don't scrape too aggressively. A modest interval captures trends without overhead.
- Tie alert thresholds to your budget, not to arbitrary round numbers.
- Always track cost, not just request counts — requests are not a proxy for spend.
- Retain long-term data if you need historical trends; short default retention loses history.
- Keep a user or tier dimension so you can spot a single heavy consumer.