Recipe Development Guide¶
Build custom compression recipes to squeeze more savings from your specific domain.
What Are Recipes?¶
Recipes are declarative YAML files that tell TokenPak how to compress content. They're matched by file type, extension, or content pattern, then apply a sequence of operations to reduce token count while preserving semantics.
TokenPak ships with built-in recipes for Python, JavaScript, Markdown, JSON, YAML, SQL, and more. Custom recipes let you target your own conventions.
Quick Start¶
# Scaffold a new recipe
tokenpak recipe create my-legal-cleanup --category legal
# Edit it
$EDITOR my-legal-cleanup.yaml
# Validate the schema
tokenpak recipe validate my-legal-cleanup.yaml
# Test against sample input
tokenpak recipe test my-legal-cleanup.yaml --input-file contract.txt
# Benchmark compression performance
tokenpak recipe benchmark my-legal-cleanup.yaml --runs 10
# Install to your active recipe set
tokenpak recipe install my-legal-cleanup.yaml
Recipe Format¶
Every recipe is a YAML file with five top-level keys:
name: my-recipe-name # unique identifier; kebab-case
category: general # see categories below
description: "What it does" # shown in tokenpak demo --list
pattern:
match: extension # how to trigger this recipe
extensions:
- .txt
- .md
action:
compression_hint: 0.20 # expected fraction removed (0.0–1.0)
operations:
- type: strip_comments
- type: collapse_whitespace
Pattern Match Modes¶
| Mode | Required key | Description |
|---|---|---|
any |
— | Always triggers (use carefully) |
extension |
extensions |
Match by file extension: .py, .md, etc. |
filename |
filenames |
Exact filenames: Makefile, Cargo.toml |
content |
keywords |
Triggers if content contains any keyword |
path_pattern |
path_patterns |
Regex match on full file path |
Examples:
# Match Python files
pattern:
match: extension
extensions: [.py]
# Match test files anywhere
pattern:
match: path_pattern
path_patterns: [".*/tests?/.*"]
# Match content with legal boilerplate
pattern:
match: content
keywords: ["WHEREAS", "INDEMNIFY", "hereinafter"]
Built-in Operations¶
Text Operations¶
| Operation | Params | Description |
|---|---|---|
strip_comments |
— | Remove # single-line comments |
collapse_whitespace |
— | Normalize spaces and newlines |
remove_empty_lines |
max_consecutive (default: 1) |
Collapse blank lines |
deduplicate_lines |
— | Remove duplicate lines (preserves order) |
truncate_lines |
max_length (default: 120) |
Truncate long lines |
remove_filler_phrases |
— | Remove hedging language ("Please note that...") |
Code Operations¶
| Operation | Params | Description |
|---|---|---|
python_docstring_compress |
mode: keep_summary | remove |
Shorten Python docstrings |
regex_replace |
pattern, replacement, flags |
Arbitrary regex substitution |
Structured Data Operations¶
| Operation | Params | Description |
|---|---|---|
json_compact |
— | Minify JSON (remove whitespace) |
keyword_filter |
keep_keywords (list) |
Keep only lines containing these keywords |
Recipe Examples¶
Strip Python Type Annotations (Lossy)¶
name: python-strip-types
category: python
description: Remove type annotations to save tokens in large Python files
pattern:
match: extension
extensions: [.py]
action:
compression_hint: 0.15
operations:
- type: regex_replace
pattern: ':\s*(str|int|float|bool|list|dict|Optional\[.*?\]|List\[.*?\])\s*='
replacement: ' ='
flags: MULTILINE
- type: python_docstring_compress
mode: keep_summary
- type: remove_empty_lines
max_consecutive: 1
Legal Boilerplate Squasher¶
name: legal-boilerplate
category: legal
description: Compress repetitive legal preamble while preserving operative clauses
pattern:
match: content
keywords: ["WHEREAS", "NOW, THEREFORE", "hereinafter referred to"]
action:
compression_hint: 0.30
operations:
- type: regex_replace
pattern: 'WHEREAS.*?(?=WHEREAS|NOW, THEREFORE)'
replacement: '[WHEREAS clause omitted]\n'
flags: DOTALL
- type: remove_filler_phrases
- type: collapse_whitespace
Test File Compressor¶
name: test-file-compress
category: python
description: Strip verbose test descriptions, keep assertions
pattern:
match: path_pattern
path_patterns: [".*/tests?/.*\\.py$", ".*test_.*\\.py$"]
action:
compression_hint: 0.25
operations:
- type: python_docstring_compress
mode: remove
- type: strip_comments
- type: remove_empty_lines
max_consecutive: 1
Obsidian Note Cleaner¶
name: obsidian-notes
category: markdown
description: Strip Obsidian-specific syntax before sending notes to LLMs
pattern:
match: content
keywords: ["%%", "^", "[["]
action:
compression_hint: 0.10
operations:
- type: regex_replace
pattern: '%%.*?%%'
replacement: ''
flags: DOTALL
- type: regex_replace
pattern: '\[\[([^\]|]+)\|([^\]]+)\]\]'
replacement: '\2'
- type: regex_replace
pattern: '\[\[([^\]]+)\]\]'
replacement: '\1'
- type: remove_empty_lines
max_consecutive: 1
Testing & Benchmarking¶
Test Against Sample Input¶
tokenpak recipe test my-recipe.yaml --input-file sample.py
# Shows: original tokens, compressed tokens, reduction %
Benchmark Multiple Runs¶
tokenpak recipe benchmark my-recipe.yaml --runs 10
# Reports: mean reduction, p95 latency, worst-case output
A/B Test Two Recipes¶
tokenpak ab create recipe-test \
--variant-a "recipe:my-recipe-v1" \
--variant-b "recipe:my-recipe-v2"
tokenpak ab status recipe-test
# After enough traffic:
tokenpak ab apply recipe-test # apply the winner
Known Categories¶
python, javascript, typescript, markdown, yaml, json, sql, html,
css, shell, go, rust, java, kotlin, swift, ruby, php,
legal, medical, financial, ops, general
Use an existing category when possible — it helps TokenPak apply category-level heuristics.
Installing & Sharing Recipes¶
# Install a local recipe file
tokenpak recipe install my-recipe.yaml
# List installed recipes
tokenpak recipe list
# Remove a recipe
tokenpak recipe remove my-recipe-name
Recipes are stored in ~/.tokenpak/recipes/. You can version them in your project's git repo and install on deploy.
Tips¶
- Start with
compression_hint: 0.10— be conservative, test, then increase - Never compress code that will be executed — recipes are for context, not output
- Use
regex_replacewithflags: MULTILINEfor multiline patterns - Test with real files from your domain, not toy examples
- Check
tokenpak trace --lastto see which recipe fired on a recent request