TokenPak Python SDK: Quick Start Guide¶
What is TokenPak?¶
TokenPak is a context compression library for LLM agents and applications. It reduces the size of long context blocks by 40-60% while preserving semantic meaning, allowing you to stay within token budgets without sacrificing information. Works with any LLM provider (OpenAI, Anthropic, Google, local models, etc.).
Installation¶
Install TokenPak via pip:
If you need ML-based compression features, install the full extras:
Basic Example¶
Here's a minimal 5-minute example to get you started:
from tokenpak import HeuristicEngine
from tokenpak.engines.base import CompactionHints
# Your long context (could be from a file, API, chat history, etc.)
context = """
Long document with 10,000+ tokens...
[Your actual long text here]
"""
# Create a compression engine
engine = HeuristicEngine()
# Define a target budget (tokens)
hints = CompactionHints(target_tokens=2048)
# Compress the context
compressed = engine.compact(context, hints)
# Use the compressed result in your LLM prompt
prompt = f"""
Context (compressed):
{compressed}
Question: What are the key points?
"""
print(f"Original: {len(context.split())} words")
print(f"Compressed: {len(compressed.split())} words")
print(f"Savings: {100 * (1 - len(compressed) / len(context)):.1f}%")
What Just Happened?¶
- Tokenization: TokenPak tokenizes your input using tiktoken or a compatible tokenizer.
- Block-based compression: The heuristic engine identifies important blocks (headers, summaries, semantic boundaries) and removes or condenses less critical content.
- Deterministic output: The same input always produces the same compressed output, making it safe for reproducible LLM workflows.
- Budget compliance: The engine respects your target token count while prioritizing information density.
Common Use Cases¶
- Chat history compression — Keep long conversation threads within token limits while preserving context
- Document retrieval augmentation — Compress search results before feeding them to your LLM
- Multi-turn dialogue — Maintain context across many exchanges without hitting rate limits
- Batch processing — Process large document collections with a fixed token budget per item
Verify Your Install¶
Test that everything is working:
Next Steps¶
- Installation Guide — Detailed setup, Python versions, virtual environments
- API Reference — Full API documentation and all available options
- Compression Options — Advanced configuration and custom engines
- Examples — Real-world examples and patterns
- CLI Reference — Command-line tools for compression tasks
Support¶
For questions, issues, or feature requests, visit the TokenPak GitHub repository.