TokenPak Recipe SDK¶
Custom recipe development tooling for building domain-specific compression recipes.
Recipes are declarative YAML files that define how TokenPak compresses content before sending it to LLMs. The Recipe SDK provides a full development workflow: scaffold → validate → test → benchmark → ship.
Quick Start¶
# 1. Scaffold a new recipe
tokenpak recipe create my-legal-cleanup --category legal --domain-example legal
# 2. Edit the generated my-legal-cleanup.yaml to your needs
# 3. Validate the schema
tokenpak recipe validate my-legal-cleanup.yaml
# 4. Test against sample input
tokenpak recipe test my-legal-cleanup.yaml --input-file contract.txt
# 5. Benchmark compression performance
tokenpak recipe benchmark my-legal-cleanup.yaml --runs 10
Recipe Format¶
Every recipe is a YAML file with five top-level keys:
name: my-recipe-name # unique identifier; kebab-case
category: general # see categories below
description: "What it does" # shown in tokenpak demo --list
pattern:
match: extension # how to trigger: any | extension | filename | content | path_pattern
extensions:
- .txt
- .md
action:
compression_hint: 0.20 # expected fraction removed (0.0–1.0)
operations:
- type: strip_comments
- type: collapse_whitespace
Pattern Match Modes¶
| mode | required key | description |
|---|---|---|
any |
— | Always triggers |
extension |
extensions |
Match by file extension list (.py, .md …) |
filename |
filenames |
Match exact filenames (Makefile, Cargo.toml) |
content |
keywords |
Match if content contains any keyword |
path_pattern |
path_patterns |
Match via regex on file path |
Built-in Operation Types¶
| type | key params | description |
|---|---|---|
regex_replace |
pattern, replacement, flags |
Regex substitution (re.MULTILINE etc.) |
strip_comments |
— | Remove # single-line comments |
deduplicate_lines |
— | Remove duplicate lines (preserves order) |
truncate_lines |
max_length (int, default 120) |
Truncate long lines |
remove_empty_lines |
max_consecutive (int, default 1) |
Collapse excessive blank lines |
collapse_whitespace |
— | Normalize spaces and excessive newlines |
python_docstring_compress |
mode: keep_summary | remove |
Shorten or remove Python docstrings |
remove_filler_phrases |
— | Remove hedging language |
json_compact |
— | Minify JSON (removes whitespace) |
keyword_filter |
keep_keywords (list) |
Keep only lines containing keywords |
Known Categories¶
python, javascript, typescript, markdown, yaml, json, sql, html,
css, general, legal, medical, config, logs, git
CLI Reference¶
tokenpak recipe create <name>¶
Scaffold a new recipe file.
Options:
--output-dir DIR Where to write the file (default: current dir)
--category CAT Category hint (default: general)
--description TEXT Short description
--match-mode MODE any | extension | filename | content | path_pattern
--ext EXT Extension hint for extension match mode
--domain-example legal | medical (use a domain-specific template)
tokenpak recipe validate <file>¶
Check a recipe against the schema. Exits 1 on hard errors. Prints warnings for soft issues (unknown category, empty description, unknown operation types).
tokenpak recipe test <file>¶
Run a recipe against sample input and print a before/after report.
Options:
--input-text TEXT Raw text to test against
--input-file FILE Read test input from a file
--filename-hint NAME Filename to check pattern matching against
Output:
Pattern match : ✅ yes
Ops applied : strip_comments, collapse_whitespace
Input chars : 1024
Output chars : 812
Compression : 20.7% removed
Hint vs actual : 20.0% expected → 20.7% actual
tokenpak recipe benchmark <file>¶
Measure compression ratio and throughput across multiple samples.
Options:
--samples-file FILE JSON list of sample strings (default: auto-generated)
--runs N Repetitions per sample for timing (default: 5)
Output:
Compression (mean) : 21.3% [min 18.4% – max 24.1%]
Hint vs actual : 20.0% → 21.3% (+1.3% delta)
Timing ms (mean) : 0.142 ms [min 0.098 – max 0.201]
Domain Examples¶
Bundled in recipes/custom-examples/:
| File | Domain | What it does |
|---|---|---|
legal-boilerplate-removal.yaml |
Legal | Strips WHEREAS recitals + signature blocks |
medical-note-cleanup.yaml |
Medical | Removes PHI headers + confidentiality notices |
legal-contract-clause-extract.yaml |
Legal | Keeps operative clauses; removes exhibits |
Generate a domain template with:
tokenpak recipe create my-legal --domain-example legal
tokenpak recipe create my-medical --domain-example medical
Programmatic Usage¶
from tokenpak.agent.recipe_sdk import RecipeSDK
sdk = RecipeSDK()
# Scaffold
path = sdk.create("my-recipe", category="legal", domain_example="legal")
# Validate
warnings = sdk.validate("my-recipe.yaml") # raises RecipeValidationError on failure
# Test
result = sdk.test("my-recipe.yaml", input_text="WHEREAS Party A...")
print(result["compression_ratio"]) # e.g. 0.34
# Benchmark
bench = sdk.benchmark("my-recipe.yaml", runs=10)
print(bench["compression"]["mean"]) # e.g. 0.33
Tips¶
- Start with
--domain-examplefor legal/medical content — saves time. - Keep
compression_hinthonest: it's used by the intelligence server to estimate token savings when auto-selecting recipes. - Use
contentmatch mode for domain recipes — file extensions are ambiguous for legal/medical text. - Run
benchmarkbefore shipping — a recipe that adds zero compression isn't worth the CPU.