TokenPak Component Diagram¶
This document describes the component diagram that visualizes TokenPak's internal module relationships.
Simplified Component View¶
graph TD
Client["Client Application"]
subgraph TokenPak["TokenPak Proxy Core"]
Router["Request Router<br/>(Entry Point)"]
VG["Validation Gate<br/>(Security)"]
TC["Token Counter<br/>(Cost Calc)"]
Cache["Cache Manager<br/>(Response + Prompt)"]
RL["Rate Limiter<br/>(Quota)"]
PR["Provider Router<br/>(Selection)"]
end
subgraph Support["Support Modules"]
Adapter["FormatAdapter<br/>(OpenAI ↔ Native)"]
CB["Circuit Breaker<br/>(Health Check)"]
Trace["Tracing<br/>(Debug)"]
Stats["Monitor<br/>(Stats/Metrics)"]
end
subgraph External["External APIs"]
Anthropic["Anthropic API"]
OpenAI["OpenAI API"]
Others["Other Providers"]
end
Client -->|HTTP| Router
Router -->|Request Flow| Adapter
Adapter -->|Check| VG
VG -->|Valid| TC
TC -->|Count Tokens| Cache
Cache -->|Cache Check| RL
RL -->|Rate Check| PR
PR -->|Select Provider| CB
CB -->|Route| Anthropic
CB -->|Route| OpenAI
CB -->|Route| Others
Router -.->|Tracing| Trace
TC -.->|Usage Data| Stats
RL -.->|Limits| Stats
Cache -.->|Cache Stats| Stats
CB -.->|Health| Stats
Anthropic -->|Response| PR
OpenAI -->|Response| PR
Others -->|Response| PR
PR -->|Response| Cache
Cache -->|Response| Client
Component Descriptions¶
Core Request Pipeline¶
1. Request Router
- What it does: Receives incoming HTTP requests, validates them, extracts model name and user intent
- Key methods: handle_request(), parse_body(), extract_intent()
- Integrates with: FormatAdapter, Tracing
2. FormatAdapter
- What it does: Converts between OpenAI-compatible format and TokenPak's native format, so the rest of the proxy doesn't care what the client sent
- Key methods: adapt_to_native(), adapt_from_native(), normalize_headers()
- Integrates with: Request Router, Provider Router
3. Validation Gate
- What it does: Inspects request/response content against configured policies; detects risky, non-compliant, or suspicious messages
- Key methods: check_content(), classify_risk(), apply_policy()
- Integrates with: Request Router, Cache Manager
4. Token Counter (VaultIndex)
- What it does: Accurately counts tokens per provider, handles prompt cache tokens (reads cost 1/4, creation costs full), and feeds data to cost calculations
- Key methods: count_tokens(), estimate_cache_tokens(), calculate_cost()
- Integrates with: Monitor, Cache Manager
5. Cache Manager
- What it does: Stores and retrieves responses, supports exact-match and semantic deduplication, injects prompt cache headers
- Key methods: lookup_cache(), store_response(), compute_semantic_hash(), inject_cache_headers()
- Integrates with: Validation Gate, Token Counter, Monitor
6. Rate Limiter
- What it does: Enforces per-IP limits, per-model limits, and cost budgets; prevents abuse and runaway spending
- Key methods: check_limit(), consume_quota(), is_within_budget()
- Integrates with: Provider Router, Monitor
7. Provider Router
- What it does: Selects which LLM provider to use based on routing rules, provider health, and failover policy
- Key methods: select_provider(), apply_routing_rules(), get_healthy_providers()
- Integrates with: Circuit Breaker, Rate Limiter
Support Modules¶
8. Circuit Breaker
- What it does: Monitors provider health (latency, error rates, API availability); detects when a provider is degraded and routes around it
- Key methods: check_health(), record_failure(), record_success(), is_provider_healthy()
- Integrates with: Provider Router, Monitor
9. Monitor (Stats/Metrics)
- What it does: Aggregates real-time statistics: token usage, cost, cache hit rate, latency, provider health
- Key methods: record_usage(), record_cache_hit(), record_latency(), export_stats()
- Integrates with: Token Counter, Cache Manager, Rate Limiter, Circuit Breaker
10. Tracing (StageTrace & PipelineTrace)
- What it does: Records request journey through the proxy for debugging and performance analysis; traces cache hits, provider selection, token counting
- Key methods: start_trace(), log_stage(), end_trace()
- Integrates with: Request Router, all core components
Data Flow Summary¶
User Request
↓
[Request Router] — parses request
↓
[FormatAdapter] — converts format if needed
↓
[Validation Gate] — checks content safety
↓
[Cache Manager] — checks for cached response
├─ Hit? → Return cached response (0 tokens) → Send to user
└─ Miss?
↓
[Rate Limiter] — checks quota
↓
[Provider Router] — selects provider
↓
[Circuit Breaker] — verifies provider health
↓
[External API] — sends request to provider
↓
[Response Received]
↓
[Token Counter] — counts input/output tokens
↓
[Cache Manager] — stores response for future use
↓
[Monitor] — records stats (cost, latency, cache, etc.)
↓
Send response to user
Performance Characteristics¶
| Component | Latency Impact | Notes |
|---|---|---|
| Request Router | < 1ms | Parsing only |
| FormatAdapter | < 1ms | Format conversion |
| Validation Gate | 10-50ms | Depends on content size & policy complexity |
| Cache Lookup | < 5ms | Local database query |
| Token Counter | 5-20ms | Tokenizer operation |
| Rate Limiter | < 1ms | Hash table lookup |
| Provider Router | < 1ms | In-memory logic |
| Circuit Breaker | < 1ms | In-memory health state |
Typical end-to-end latency (cache miss): 150-500ms (most time spent waiting for provider response)
Testing Strategy¶
Each module has unit and integration tests:
tests/
├── test_request_router.py — Parse & validation
├── test_format_adapter.py — Format conversion
├── test_validation_gate.py — Content security
├── test_token_counter.py — Token accuracy
├── test_cache_manager.py — Cache hit/miss logic
├── test_rate_limiter.py — Quota enforcement
├── test_provider_router.py — Provider selection
├── test_circuit_breaker.py — Health checking
├── test_monitor.py — Stats collection
└── integrations/
├── test_anthropic.py — Anthropic integration
├── test_openai.py — OpenAI integration
└── test_litellm.py — LiteLLM integration
Extension Points¶
To add a new feature to TokenPak, typically you:
- Add a new module in the appropriate layer
- Integrate with Monitor to export metrics
- Add tests for your module
- Update configuration if new settings are needed
- Document the new feature in
docs/
See docs/CONTRIBUTING.md for detailed extension patterns.