Testing

Quality Assurance Framework

METAINFORMANT implements a comprehensive quality assurance framework combining automated testing, code quality checks, and validation processes.

Quality Assurance Architecture

graph TD
    AcodeDevelopment[Code Development] --> BstaticAnalysis[Static Analysis]
    B --> CtypeChecking[Type Checking]
    C --> D[Linting]

    D --> EunitTesting[Unit Testing]
    E --> FintegrationTesting[Integration Testing]
    F --> Gend-to-endTesting[End-to-End Testing]

    G --> HperformanceTesting[Performance Testing]
    H --> IloadTesting[Load Testing]

    I --> JqualityGates[Quality Gates]
    J --> K{All Passed?}

    K -->|Yes| LreleaseReady[Release Ready]
    K -->|No| MissueResolution[Issue Resolution]

    M --> A


    subgraph "Automated Checks"
        N[mypy] -.-> C
        O[ruff] -.-> D
        P[black] -.-> D
        Q[isort] -.-> D
    end

    subgraph "Test Categories"
        R[pytest] -.-> E
        SrealDataOnly[Real Data Only] -.-> E
        TnoMocks[No Mocks] -.-> E
        U[Integration] -.-> F
    end

    subgraph "Quality Metrics"
        Vcoverage>90%[Coverage >90%] -.-> J
        WzeroCriticalIssues[Zero Critical Issues] -.-> J
        XperformanceBenchmarks[Performance Benchmarks] -.-> J
        YdocumentationComplete[Documentation Complete] -.-> J
    end

Test Execution Pipeline

graph TD
    AtestSuite[Test Suite] --> BenvironmentSetup[Environment Setup]
    B --> CdependencyVerification[Dependency Verification]

    C --> DtestDiscovery[Test Discovery]
    D --> EtestCollection[Test Collection]

    E --> F{Test Category}
    F -->|Unit| GfastExecution[Fast Execution]
    F -->|Integration| HworkflowTesting[Workflow Testing]
    F -->|E2E| IfullPipeline[Full Pipeline]

    G --> JparallelExecution[Parallel Execution]
    H --> J
    I --> J

    J --> KresultCollection[Result Collection]
    K --> LcoverageAnalysis[Coverage Analysis]

    L --> MreportGeneration[Report Generation]
    M --> N{Quality Standards Met?}

    N -->|Yes| OqualityAssurancePass[Quality Assurance Pass]
    N -->|No| PfailureAnalysis[Failure Analysis]

    P --> QissueClassification[Issue Classification]
    Q --> RfixImplementation[Fix Implementation]

    R --> S[Re-testing]
    S --> N


    subgraph "Execution Environment"
        TuvVenv[uv venv] -.-> B
        U[Dependencies] -.-> C
        VtestData[Test Data] -.-> C
        WexternalTools[External Tools] -.-> C
    end

    subgraph "Test Organization"
        Xtests/test*.py[tests/test_*.py] -.-> D
        Y[Domain-specific] -.-> F
        ZintegrationTests[Integration Tests] -.-> F
        AAe2eTests[E2E Tests] -.-> F
    end

    subgraph "Quality Metrics"
        BBlineCoverage[Line Coverage] -.-> L
        CCbranchCoverage[Branch Coverage] -.-> L
        DDmutationTesting[Mutation Testing] -.-> L
        EEperformanceBenchmarks[Performance Benchmarks] -.-> L
    end

Code Quality Policy (STRICTLY NO MOCKS/FAKES/PLACEHOLDERS)

ABSOLUTE PROHIBITION: Never use fake/mocked/stubbed methods, objects, or network shims in source code or tests.

Source Code Policy: All production functions must perform real computations or make real API calls. NO DUMMY DATA RETURNS. Placeholder implementations that return hardcoded values are strictly prohibited.

Real Implementation Only: All code must exercise real algorithms and external behavior:

Networked tests: perform real HTTP requests with short timeouts. If offline, skip gracefully with clear messages.
CLI-dependent tests (e.g., amalgkit): run only when the dependency is available on PATH; otherwise skip with dependency notes.
Database tests: use real database connections or skip when unavailable.
API tests: make real API calls or skip when network/credentials unavailable.

Anti-Pattern Enforcement: Mocking is an anti-pattern that creates brittle tests disconnected from reality.

Quality Assurance: Real implementations reveal actual bugs, performance issues, and integration problems.

Environment Setup: It is acceptable to set environment variables for test setup, but do not monkeypatch or replace functions.

Test Artifacts: Tests must write all artifacts only under output/ directory.

Reproducibility: Prefer deterministic seeds and stable filenames for reproducible test runs.

Clear Documentation: When external dependencies are unavailable, tests must clearly document what is being skipped and why.

UV-Based Testing Workflow

Environment Setup

Before running tests, ensure your test environment is properly set up:

# Setup test environment (recommended first step)
bash scripts/package/verify.sh --mode deps

# Setup for specific test types
bash scripts/package/verify.sh --mode deps --test-type fast    # Core tests only
bash scripts/package/verify.sh --mode deps --test-type network # Include network tests
bash scripts/package/verify.sh --mode deps --test-type all     # Full test suite

# Verify setup without installing
bash scripts/package/verify.sh --mode deps --verify-only

Test Dependency Groups

METAINFORMANT uses UV dependency groups for different test scenarios:

test-fast: Core functionality tests (minimal dependencies)
test-network: Include network-dependent tests
test-external: Include external CLI tool tests
test-all: Full test suite with all optional dependencies

Running Tests

Basic Execution

# Run all tests with minimal output
uv run pytest -q

# Run with coverage
uv run pytest --cov=src/metainformant --cov-report=html

# Run specific test files
uv run pytest tests/test_core_text.py -v

Note: Test scripts automatically detect FAT filesystems and configure UV cache and virtual environment locations accordingly. On FAT filesystems, tests use /tmp/metainformant_venv if available. The setup script (scripts/package/setup.sh) shows real-time test progress with verbose output, making it easy to see which tests are running. See UV Setup Guide for details.

Professional Test Runner

Use the comprehensive test runner script:

# Standard test run (automatically handles FAT filesystems)
bash scripts/package/test.sh

# Fast tests only (skip slow/network/external dependencies)
bash scripts/package/test.sh fast

# Optimized test runner
bash scripts/package/test.sh --mode fast

# Include network tests
bash scripts/package/test.sh network

# Generate coverage reports
bash scripts/package/test.sh coverage

# Run tests matching a pattern
bash scripts/package/test.sh all

FAT Filesystem Support: All test scripts automatically detect FAT filesystems and use appropriate venv locations (/tmp/metainformant_venv on FAT, .venv on standard filesystems).

Test Categories

Tests are organized with custom pytest markers:

slow: Tests that take significant time to complete
network: Tests requiring internet connectivity
external_tool: Tests requiring external CLI tools (e.g., amalgkit)
integration: End-to-end integration tests

Configuration

Test configuration is managed in pyproject.toml:

[tool.pytest.ini_options]
timeout = 10  # Default 10-second timeout for all tests
addopts = ["--cov=src/metainformant"]
markers = [
    "slow: marks tests as slow",
    "network: marks tests as requiring network access",
    "external_tool: marks tests as requiring external tools",
    "integration: marks tests as integration tests",
]

[tool.coverage.run]
branch = true
omit = ["*/tests/*", "*/__pycache__/*"]

[tool.coverage.report]
exclude_lines = [
    "pragma: no cover",
    "def __repr__",
    "raise AssertionError",
    "raise NotImplementedError",
]

Timeout Configuration

Tests have a default 10-second timeout to prevent hanging. Slow tests can be marked with @pytest.mark.slow and run separately:

# Run fast tests only (default)
uv run pytest tests/

# Run all tests including slow ones
uv run pytest tests/ --runslow

# Run only slow tests
uv run pytest tests/ -m slow

# Skip network tests
uv run pytest tests/ -m "not network"

# Skip external tool tests
uv run pytest tests/ -m "not external_tool"

# Custom timeout for specific runs
uv run pytest tests/ --timeout=30

Test Structure and Organization

Directory Layout

tests/: All test files at repository root
tests/conftest.py: Shared pytest configuration and fixtures
tests/test_*.py: Individual test modules
tests/data/: Test input fixtures and data
output/: All test outputs and artifacts

Test Coverage Matrix

The test suite provides comprehensive coverage:

Module	Primary Tests	Coverage
Core Utilities	`test_core_*.py`	Config, I/O, text, logging, parallel, paths, cache, hash, db
DNA Analysis	`test_dna_*.py`	Sequences, alignment, MSA, phylogeny, population genetics, FASTQ
RNA Analysis	`test_rna_*.py`	amalgkit wrapper, workflow, configs, step runners
Single-Cell	`test_singlecell_*.py`	Preprocessing, dimensionality, clustering, trajectory
Quality Control	`test_quality_*.py`	FASTQ quality analysis, contamination detection
Protein Analysis	`test_protein_*.py`	UniProt, PDB, AlphaFold, InterPro, structure analysis
Mathematical Models	`test_math_*.py`	Population genetics, coalescent, epidemiology, selection
Simulation	`test_simulation.py`	Sequence generation, RNA counts, agent-based models
Visualization	`test_visualization_*.py`	Plots, trees, animations
Ontology	`test_ontology_*.py`	Gene Ontology, OBO parsing
CLI Interface	`test_cli.py`	Command-line argument parsing and dispatch

Test Conventions

File Organization

One test file per source module: src/foo/bar.py → tests/test_foo_bar.py
Class-based organization: Group related tests in classes
Descriptive test names: Use clear, descriptive function names

Test Isolation

Setup/teardown: Use setup_method and teardown_method for clean environments
Independent tests: Each test should be runnable in isolation
Output directory: All test artifacts go to output/ subdirectories

Example Test Structure

import pytest
from metainformant.core.io import ensure_directory
from metainformant.some_module import some_function

class TestSomeFunction:
    def setup_method(self):
        """Setup test environment."""
        self.output_dir = ensure_directory("output/test_some_module")
        self.test_data = {...}

    def teardown_method(self):
        """Cleanup after test."""
        # Cleanup if needed (usually not required)
        pass

    def test_basic_functionality(self):
        """Test basic function behavior."""
        result = some_function(self.test_data)
        assert result is not None

    def test_edge_cases(self):
        """Test edge cases and error conditions."""
        with pytest.raises(ValueError):
            some_function(invalid_input)

    @pytest.mark.slow
    def test_performance_intensive(self):
        """Test that takes significant time."""
        # Test implementation
        pass

    @pytest.mark.network
    def test_network_dependent(self):
        """Test requiring network access."""
        # Skip if offline
        try:
            result = some_network_function()
            assert result is not None
        except ConnectionError:
            pytest.skip("Network not available")

Network and External Tool Testing

Network Tests

Network-dependent tests use real API calls with graceful offline handling:

import requests
import pytest

def _check_online():
    """Check if network connectivity is available."""
    try:
        response = requests.get('https://httpbin.org/status/200', timeout=5)
        return response.status_code == 200
    except requests.RequestException:
        return False

@pytest.mark.network
@pytest.mark.skipif(not _check_online(), reason="Network not available")
def test_uniprot_api():
    """Test UniProt API with real network call."""
    from metainformant.protein.uniprot import map_ids_uniprot
    
    result = map_ids_uniprot(['P12345'], from_db='UniProtKB_AC-ID', to_db='Gene_Name')
    assert len(result) >= 0  # May be empty if ID not found

External Tool Tests

Tests requiring external CLI tools check for availability:

import shutil
import subprocess
import pytest

@pytest.mark.external_tool
@pytest.mark.skipif(not shutil.which("amalgkit"), reason="amalgkit not available")
def test_amalgkit_execution():
    """Test amalgkit CLI tool integration."""
    result = subprocess.run(['amalgkit', '--version'], capture_output=True, text=True)
    assert result.returncode == 0

Environment Variables

Some tests require environment variables for configuration:

# For NCBI tests
export NCBI_EMAIL="your.email@example.com"

# For database tests
export TEST_DATABASE_URL="sqlite:///output/test.db"

# Run tests with environment
./scripts/package/test.sh --mode network

Test Data Management

Input Fixtures

Small test data: Include directly in tests/data/
Generated test data: Create programmatically in setup_method
Large test data: Download during test execution (with caching)

Output Management

Consistent paths: Use metainformant.core.io.ensure_directory
Deterministic names: Use consistent, predictable filenames
Cleanup policy: Generally leave outputs for inspection (they're in output/)

Example Test Data Handling

def setup_method(self):
    """Setup test environment with data."""
    self.output_dir = ensure_directory("output/test_analysis")
    
    # Create test data
    self.test_sequences = [
        "ATCGATCGATCG",
        "GCTAGCTAGCTA",
        "TTTTAAAACCCC"
    ]
    
    # Write test FASTA
    self.test_fasta = f"{self.output_dir}/test.fasta"
    with open(self.test_fasta, 'w') as f:
        for i, seq in enumerate(self.test_sequences):
            f.write(f">seq_{i}\n{seq}\n")

Continuous Integration

The test suite is designed for CI/CD environments:

CI Configuration

# Example GitHub Actions
- name: Run fast tests
  run: ./scripts/package/test.sh --mode fast

- name: Run network tests
  run: ./scripts/package/test.sh --mode network
  if: env.ENABLE_NETWORK_TESTS == 'true'

- name: Generate coverage report
  run: ./scripts/package/test.sh --mode coverage

Test Selection

Default CI: Run fast tests only
Nightly builds: Include slow and network tests
Release testing: Full test suite including integration tests

Performance Considerations

Test Execution Speed

Fast tests: < 1 second per test
Slow tests: Marked with @pytest.mark.slow
Parallel execution: Tests designed for parallel execution
Resource usage: Tests avoid excessive memory/CPU usage

Large Dataset Testing

@pytest.mark.slow
def test_large_dataset_processing(self):
    """Test with large synthetic dataset."""
    # Generate large test data efficiently
    large_data = generate_synthetic_data(n_samples=100000)
    
    # Test processing
    result = process_large_data(large_data)
    assert len(result) == len(large_data)

Development Workflow

Test-Driven Development

Write failing test: Implement test for new functionality
Implement feature: Write minimal code to pass test
Refactor: Improve implementation while maintaining test passage
Add edge cases: Test error conditions and boundary cases

Running Tests During Development

# Run tests for specific module during development
uv run pytest tests/test_core_text.py -v --tb=short

# Watch mode for continuous testing (with external tool)
# uv pip install pytest-watch
ptw tests/test_core_text.py

# Run with immediate failure reporting
uv run pytest -x tests/test_core_text.py

Troubleshooting

UV-Related Issues

UV Not Found

❌ uv is not installed or not in PATH
   Please install uv first:
   curl -LsSf https://astral.sh/uv/install.sh | sh
   Or visit: https://github.com/astral.sh/uv

Solution: Install uv and ensure it's in your PATH:

curl -LsSf https://astral.sh/uv/install.sh | sh
export PATH="$HOME/.cargo/bin:$PATH"

Dependency Sync Issues

❌ Failed to sync dependencies

Solution: Clear UV cache and retry:

rm -rf .uv-cache/
bash scripts/package/verify.sh --mode deps

FAT Filesystem Issues

📁 FAT filesystem detected - using /tmp locations

Note: This is normal on FAT filesystems (exFAT, FAT32). The system automatically uses /tmp for virtual environments and caches.

Test Environment Not Ready

❌ pytest not available

Solution: Setup test environment:

bash scripts/package/verify.sh --mode deps --test-type fast

Network Tests Failing

⚠️  Network connectivity check failed

Note: Network tests require internet access. They will be skipped gracefully if offline.

External Tool Tests Failing

⚠️  amalgkit not available - some tests may be skipped

Note: External tool tests require specific CLI tools. Install them or tests will be skipped.

Common Test Failures

Import Errors

ModuleNotFoundError: No module named 'metainformant'

Solution: Ensure you're running tests from the repository root and PYTHONPATH is set:

cd /path/to/metainformant
export PYTHONPATH="$PWD/src:$PYTHONPATH"
uv run pytest tests/

Permission Errors

PermissionError: [Errno 13] Permission denied

Solution: Ensure output directories are writable:

chmod -R u+w output/

Memory Issues

MemoryError: Out of memory

Solution: Run fewer tests in parallel or increase system memory:

uv run pytest tests/test_core_*.py  # Run smaller test sets

Related: CLI, Core

FilesExpand file tree

testing.md

Latest commit

History