Use TokenPak with Gemini CLI¶
This guide is for developers using Google's Gemini CLI who want TokenPak cost tracking, telemetry, and compression on Gemini requests.
Gemini CLI supports a custom Gemini API base URL through GOOGLE_GEMINI_BASE_URL. Point that variable at TokenPak's local proxy. TokenPak then forwards Google Generative AI requests upstream while recording usage.
What you need before starting:
- Gemini CLI (
@google/gemini-cli) installed - TokenPak installed locally
GEMINI_API_KEYfor Google AI Studio- A shell where you can export environment variables before launching
gemini
Copy-paste setup¶
pip install tokenpak
tokenpak setup
curl -s http://localhost:8766/health | python3 -m json.tool
Then launch Gemini CLI from the same shell:
export GEMINI_API_KEY="your-gemini-api-key"
export GOOGLE_GEMINI_BASE_URL="http://localhost:8766"
gemini -p "Reply with one sentence confirming Gemini CLI is routed through TokenPak."
Do not add /v1 to GOOGLE_GEMINI_BASE_URL. Gemini CLI sends Google Generative AI paths such as /v1beta/models/...:generateContent; TokenPak detects those paths and routes them to the Google adapter.
1. Start TokenPak¶
pip install tokenpak
tokenpak setup
tokenpak setup detects provider keys, creates ~/.tokenpak/config.yaml, and starts the proxy on port 8766. You should see:
TokenPak proxy listening on http://localhost:8766
Confirm the proxy is healthy:
curl -s http://localhost:8766/health | python3 -m json.tool
Expected response shape:
{
"status": "ok",
"uptime_seconds": 3,
"version": "1.7.1",
"requests_total": 0,
"requests_errors": 0,
"compression_ratio_avg": 0.0
}
If status is not "ok", run tokenpak status before launching Gemini CLI.
2. Launch Gemini CLI through TokenPak¶
Export the Gemini key and base URL in the same terminal where you run gemini:
export GEMINI_API_KEY="your-gemini-api-key"
export GOOGLE_GEMINI_BASE_URL="http://localhost:8766"
gemini -p "Say hello through TokenPak."
Use GOOGLE_GEMINI_BASE_URL for Google AI Studio / Gemini API traffic. If you intentionally use Vertex AI mode, Gemini CLI also supports GOOGLE_VERTEX_BASE_URL, but Vertex setup has separate project and location requirements; this guide focuses on the Google AI Studio path.
3. Verify traffic is routed through TokenPak¶
After Gemini CLI returns a response:
tokenpak status
You should see at least one recent request. You can also check /health again:
curl -s http://localhost:8766/health | python3 -m json.tool
If requests_total remains 0, Gemini CLI did not inherit GOOGLE_GEMINI_BASE_URL; see Troubleshooting.
4. Check your savings¶
After a few Gemini CLI prompts:
tokenpak cost --week # spend by model
tokenpak savings # tokens compressed vs. uncompressed
Short prompts may pass through with little or no compression. Larger repeated context is where TokenPak has the most room to reduce tokens.
Troubleshooting¶
requests_total stays 0 after Gemini CLI responds¶
Gemini CLI is not using the TokenPak base URL. Confirm:
GOOGLE_GEMINI_BASE_URLis exported in the same shell that runsgemini.- The value is exactly
http://localhost:8766. - You did not include
/v1or/v1betain the base URL. - You restarted Gemini CLI after changing the variable.
Run this in the same shell before launching Gemini CLI:
printf '%s\n' "$GOOGLE_GEMINI_BASE_URL"
Proxy not started — Gemini CLI shows connection errors¶
Verify the proxy directly:
curl -s http://localhost:8766/health
If this returns Connection refused, start the proxy:
tokenpak serve
You can also re-run tokenpak setup if this is your first install.
Port collision — proxy fails to start on 8766¶
If 8766 is already in use:
lsof -i :8766
Stop the conflicting process, then restart TokenPak. Alternatively, run TokenPak on another port and update Gemini CLI's base URL:
TOKENPAK_PORT=8767 tokenpak serve
export GOOGLE_GEMINI_BASE_URL="http://localhost:8767"
Auth errors — 401 or invalid API key¶
Gemini CLI still needs a valid Gemini API key. TokenPak does not replace credentials. Confirm:
GEMINI_API_KEYis exported in the same shell that runsgemini.- The key is valid for Google AI Studio / Gemini API.
- You are not mixing Vertex AI variables with Google AI Studio variables.
Environment caching — variable changes do not take effect¶
Gemini CLI reads environment variables when the process starts. After changing GOOGLE_GEMINI_BASE_URL:
- Stop the current
geminiprocess. - Export the new value.
- Start a new
geminicommand. - Check
tokenpak statusagain.
If you run Gemini CLI from an editor task runner or terminal multiplexer, make sure that runner inherits the updated environment.
Tools or function-calling requests fail¶
TokenPak's Google adapter does not yet translate Google function-calling/tool schemas. Plain text prompts are supported; tool-heavy workflows may fail loudly instead of being silently altered. Use Claude Code, OpenAI SDK, or Cline routes for tool-calling workflows until Google tool translation ships.