Recipe: Cost Monitoring & Observability¶

What this solves: Export TokenPak usage and cost metrics to Prometheus or Grafana for real-time dashboards and alerts on spending trends.

Prerequisites¶

TokenPak installed
Prometheus or Grafana running (local or cloud)
API keys for providers
Understanding of metrics (requests, tokens, cost)

Config Snippet¶

# config.yaml
metrics:
  enabled: true

  # Export to Prometheus
  prometheus:
    enabled: true
    port: 8001  # Metrics endpoint: http://localhost:8001/metrics
    push_interval_seconds: 60  # Push to Prometheus every 60s

  # Dimensions to track
  track_by:
    - user_id
    - model
    - provider
    - endpoint
    - status_code

  # Custom metrics
  custom_metrics:
    - name: tokenpak_cost_by_user_daily
      type: gauge
      dimension: user_id
      period: 1d

    - name: tokenpak_tokens_per_request
      type: histogram
      buckets: [100, 250, 500, 1000, 2000, 5000, 10000]

    - name: tokenpak_latency_by_provider
      type: histogram
      dimension: provider
      buckets: [10, 50, 100, 250, 500, 1000, 2000]

providers:
  openai:
    type: openai
    api_key: ${OPENAI_API_KEY}

  anthropic:
    type: anthropic
    api_key: ${ANTHROPIC_API_KEY}

models:
  gpt-4: { provider: openai, cost_per_1k_input: 3, cost_per_1k_output: 6 }
  gpt-3.5-turbo: { provider: openai, cost_per_1k_input: 0.5, cost_per_1k_output: 1.5 }
  claude-opus: { provider: anthropic, cost_per_1k_input: 15, cost_per_1k_output: 75 }

Prometheus scrape config (prometheus.yml):

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: tokenpak
    static_configs:
      - targets: ['localhost:8001']

Grafana dashboard example:

{
  "dashboard": {
    "title": "TokenPak Cost Monitoring",
    "panels": [
      {
        "title": "Daily Cost by User",
        "targets": [
          {
            "expr": "tokenpak_cost_by_user_daily",
            "legendFormat": "{{ user_id }}"
          }
        ],
        "type": "graph"
      },
      {
        "title": "Total Spend (24h)",
        "targets": [
          {
            "expr": "sum(increase(tokenpak_cost_usd[24h]))"
          }
        ],
        "type": "stat"
      },
      {
        "title": "Requests by Model",
        "targets": [
          {
            "expr": "sum(rate(tokenpak_requests_total[5m])) by (model)"
          }
        ],
        "type": "piechart"
      }
    ]
  }
}

Test & Verify¶

Step 1: Start Prometheus and Grafana locally:

# Start Prometheus (example, adjust path)
prometheus --config.file=prometheus.yml &

# Start Grafana (Docker example)
docker run -p 3000:3000 grafana/grafana &

Step 2: Start TokenPak proxy:

tokenpak proxy --config config.yaml
# Metrics available at: http://localhost:8001/metrics

Step 3: Make some requests to generate metrics:

for i in {1..10}; do
  curl -X POST http://localhost:8000/v1/messages \
    -H "Authorization: Bearer user-$((i % 3))" \
    -d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello"}]}' \
    -s > /dev/null &
done
wait

Step 4: Verify metrics are being exported:

curl -s http://localhost:8001/metrics | grep tokenpak
# Expected output:
# tokenpak_requests_total{model="gpt-4",provider="openai",status="200"} 10
# tokenpak_cost_usd_total{model="gpt-4",provider="openai"} 0.45
# tokenpak_tokens_sent_total{model="gpt-4"} 245
# tokenpak_tokens_received_total{model="gpt-4"} 1203
# tokenpak_latency_seconds_bucket{le="0.05",provider="openai"} 2
# tokenpak_latency_seconds_bucket{le="0.1",provider="openai"} 8
# tokenpak_latency_seconds_bucket{le="+Inf",provider="openai"} 10

Step 5: View in Grafana dashboard: - Log in to Grafana (http://localhost:3000, default creds admin/admin) - Add Prometheus data source (http://localhost:9090) - Create a panel with query: tokenpak_cost_by_user_daily - Expected: Gauge showing per-user cost

Step 6: Set up cost alerting rule:

# alert-rules.yaml (Prometheus)
groups:
  - name: tokenpak_alerts
    rules:
      - alert: HighDailyCost
        expr: sum(increase(tokenpak_cost_usd[24h])) > 50
        for: 5m
        annotations:
          summary: "Daily TokenPak cost exceeded $50"
          description: "Current 24h spend: {{ $value }}"

      - alert: UnusualLatency
        expr: tokenpak_latency_seconds > 5
        for: 2m
        annotations:
          summary: "TokenPak requests slower than 5s (unusual)"
          description: "Provider: {{ $labels.provider }}, latency: {{ $value }}s"

What Just Happened¶

TokenPak continuously exported metrics in Prometheus text format:

Each request increments counters: tokenpak_requests_total
Cost calculated based on tokens and model pricing, exported as tokenpak_cost_usd_total
Latencies recorded in histogram buckets for percentile analysis
Prometheus scrapes metrics every 15 seconds (configurable)
Grafana queries Prometheus and renders dashboards
Alerts fire when thresholds (e.g., daily cost > $50) are exceeded

You have a real-time view of what you're spending and where — no manual billing exports, no surprise invoices.

Common Pitfalls¶

Pitfall 1: Metrics are high-cardinality - ❌ Wrong: Track every user_id separately (1000s of series = Prometheus overload) - ✅ Right: Aggregate by tier: track_by: [user_tier, model] (10s of series)

Pitfall 2: Scrape interval too frequent - ❌ Wrong: scrape_interval: 1s (high overhead, no additional insight) - ✅ Right: scrape_interval: 15s - 30s (captures trends, manageable load)

Pitfall 3: Alert thresholds are arbitrary - ❌ Wrong: Alert at $100/day (but you budgeted $500) - ✅ Right: Alert at 75% of your daily budget: > (daily_budget * 0.75)

Pitfall 4: Missing cost dimension - ❌ Wrong: Only track request counts (requests ≠ cost) - ✅ Right: Always export cost alongside requests: tokenpak_cost_usd_total

Pitfall 5: No long-term retention - ❌ Wrong: Prometheus default 15d retention (trends disappear) - ✅ Right: Push metrics to long-term storage: external_write → S3, InfluxDB, or Datadog

Pitfall 6: Forgetting user dimension - ❌ Wrong: Only per-model cost (don't know if one user is abusing service) - ✅ Right: Always include user_id or user_tier dimension for billing audits