Skip to content

TokenPak Testing Guide

Overview

TokenPak uses pytest for unit testing, with structured test organization and coverage tracking. This guide covers how to run, write, and maintain tests — especially Phase 2 coverage additions.

Quick Start

Run all tests

pytest tests/ -v

Run Phase 2 tests (validation + error + cache)

pytest tests/test_validation*.py tests/test_error*.py tests/test_cache*.py -v

Run with coverage

pytest tests/ --cov=tokenpak --cov-report=html
open htmlcov/index.html

Run a single test file

pytest tests/test_validation.py -v

Run tests matching a pattern

pytest -k "test_schema" -v

Test Organization

tests/
├── test_validation.py          # Phase 2: Input validation, schemas
├── test_error_handling.py       # Phase 2: Error recovery, logging
├── test_cache.py                # Phase 2: Cache behavior, TTL, concurrency
├── test_adapters.py             # Core: Adapter implementations
├── test_routing.py              # Core: Provider routing logic
├── test_compression.py          # Core: Token compression
├── test_integration.py          # Integration tests (slow)
└── conftest.py                  # Shared fixtures

Test Markers

Tests are marked with @pytest.mark to enable selective running:

Marker Use Case Run Command
unit Fast unit tests pytest -m unit
integration Requires external services pytest -m integration
slow >1 second per test pytest -m "not slow"
phase2 Phase 2 coverage additions pytest -m phase2
xfail Known failures (expected) Displayed in report

Run unit tests only

pytest -m "not integration and not slow" -v

Run Phase 2 tests with coverage

pytest -m phase2 \
  --cov=tokenpak.validation \
  --cov=tokenpak.core.error_handling \
  --cov=tokenpak.cache \
  --cov-report=term-missing

Phase 2 Test Scope

Phase 2 adds comprehensive testing for three critical modules:

Validation (tests/test_validation.py)

Tests input validation, schema enforcement, and type checking.

Coverage areas: - Valid config parsing - Invalid/missing required fields - Type coercion (str → int, bool, etc.) - Default value handling - Nested object validation - Array/list validation

Example:

@pytest.mark.phase2
def test_schema_validation_missing_required_field():
    """Invalid config missing 'provider' field should raise ValidationError."""
    config = {"model": "gpt-4"}  # Missing required 'provider'
    with pytest.raises(ValidationError):
        validate_config(config)

Error Handling (tests/test_error_handling.py)

Tests error recovery, logging, and context propagation.

Coverage areas: - API errors (timeout, auth failure, rate limit) - Fallback logic (retry, alternative provider) - Error logging and context - Circuit breaker behavior - Graceful degradation

Example:

@pytest.mark.phase2
def test_retry_on_timeout():
    """Timeout should trigger automatic retry."""
    with mock.patch('requests.post', side_effect=Timeout()):
        result = call_with_retry(max_retries=2)
        assert mock.patch.call_count == 2

Cache Management (tests/test_cache.py)

Tests cache hit/miss behavior, TTL expiry, and concurrent access.

Coverage areas: - Cache hit (key found, fresh) - Cache miss (key not found or expired) - TTL expiry handling - Cache invalidation - LRU eviction - Concurrent access / race conditions - Size limits

Example:

@pytest.mark.phase2
def test_cache_ttl_expiry():
    """Item should expire after TTL."""
    cache = LRUCache(ttl_seconds=1)
    cache.set("key", "value")
    assert cache.get("key") == "value"

    time.sleep(1.1)  # Wait past TTL
    assert cache.get("key") is None

Writing Tests

Test Structure

import pytest
from unittest import mock
from tokenpak.validation import validate_config
from tokenpak.core.error_handling import ValidationError

@pytest.mark.phase2
def test_meaningful_name_describes_behavior():
    """
    Docstring: 1-2 sentences explaining what is being tested and why.
    """
    # Arrange: Set up test data and mocks
    config = {"provider": "anthropic", "model": "claude-3-sonnet"}

    # Act: Execute the code under test
    result = validate_config(config)

    # Assert: Verify expected behavior
    assert result.is_valid
    assert result.provider == "anthropic"

Fixtures (Reusable Setup)

Define shared test data in conftest.py:

@pytest.fixture
def valid_config():
    """A minimal valid TokenPak config."""
    return {
        "provider": "anthropic",
        "model": "claude-3-sonnet",
        "max_tokens": 4096,
    }

@pytest.fixture
def mock_api_response():
    """Mock a successful API response."""
    return {
        "id": "msg_123",
        "content": "Hello, world!",
        "usage": {"input_tokens": 10, "output_tokens": 5}
    }

Use in tests:

def test_with_fixture(valid_config, mock_api_response):
    result = process_config(valid_config)
    assert result is not None

Mocking External Dependencies

from unittest import mock

@pytest.mark.phase2
def test_with_mock():
    """Test in isolation by mocking external calls."""
    with mock.patch('tokenpak.sdk.anthropic.call_api') as mock_call:
        mock_call.return_value = {"content": "Mocked response"}

        result = adapter.get_completion("Hello")

        assert result == "Mocked response"
        mock_call.assert_called_once()

Coverage Reports

Understand the HTML Report

pytest tests/ --cov=tokenpak --cov-report=html
open htmlcov/index.html

Red lines = Not covered (no test path) Yellow lines = Partially covered (some branches not executed) Green lines = Fully covered

Coverage by Module

pytest tests/ --cov=tokenpak --cov-report=term-missing

Example output:

Name                                   Stmts   Miss Cover   Missing
------------------------------------------------------------------------
tokenpak/__init__.py                      5      0   100%
tokenpak/sdk/base.py                      60      8    87%   120-125,134
tokenpak/validation/__init__.py          45      3    93%   78,102,115
tokenpak/cache/__init__.py               55     12    78%   45-60,88-92
------------------------------------------------------------------------
TOTAL                                   165     23    86%

Missing column shows line numbers not executed by any test.

Find Untested Code Paths

  1. Look for high "Miss" counts in critical modules
  2. Read the "Missing" line numbers
  3. Add tests to cover those paths
  4. Re-run coverage to verify improvement

Continuous Integration

GitHub Actions

Coverage is checked automatically on every PR:

  1. Phase 2 tests run on Python 3.11
  2. Tier-1 gate enforced — fails if validation+error+cache < 50%
  3. Coverage report uploaded to Codecov
  4. Delta comment posted on PR (coverage change)

On PR merge, coverage baseline is updated for next comparison.

Pre-commit Hook

Local coverage check before every commit:

git commit  # Automatically runs coverage-check.sh
  • Warns if coverage < 45%
  • Fails if coverage < 40%
  • Override: git commit --no-verify

Common Issues & Fixes

Coverage not updating?

Check 1: Are you running the right test file?

pytest tests/test_validation.py -v  # Should run the tests you added

Check 2: Is the code path being executed?

pytest tests/test_validation.py --cov=tokenpak.validation --cov-report=term-missing
# Check "Missing" lines — add tests for those

Check 3: Is coverage being measured correctly?

# Regenerate coverage DB
rm -rf .coverage htmlcov/
pytest tests/ --cov=tokenpak --cov-report=html

Test is flaky (fails randomly)?

  1. Look for: Time-dependent code, random data, external API calls
  2. Fix with: Mocks, fixtures, freezegun for time, @pytest.mark.flaky
  3. Never: Use @pytest.mark.skip to hide flakiness

Test runs too slowly?

  1. Mark as @pytest.mark.slow
  2. Run fast tests with: pytest -m "not slow"
  3. Run slow tests separately in CI or nightly

Best Practices

Do: - Write tests before implementing features (TDD) - Use descriptive test names (test_cache_hit_returns_fresh_value) - Mock external API calls - Test edge cases (empty inputs, null values, exceptions) - Use fixtures for reusable setup - Keep tests focused (one behavior per test)

Don't: - Test implementation details (test behavior, not code) - Skip flaky tests (fix them instead) - Disable coverage for convenience - Write overly complex tests (refactor into smaller pieces) - Ignore test failures (fix the bug or mark as xfail)

Further Reading


Troubleshooting Test Collection

Slow or hanging collection

If pytest --collect-only is unusually slow or appears to hang, the most common cause is a fixture or module-level import that does expensive work at import time (network calls, large model loads, or reading a missing file path).

  • Run pytest --collect-only --trace-config to identify slow plugin/fixture loads.
  • Prefer package imports (from tokenpak.core import ...) over loading a module from a hardcoded filesystem path — path-based loading breaks when the working directory or layout changes.
  • If a test references a file that may be absent, guard it with pytest.importorskip or a fixture that skips cleanly rather than failing at collection time.