METAINFORMANT implements a comprehensive quality assurance framework combining automated testing, code quality checks, and validation processes.
graph TD
AcodeDevelopment[Code Development] --> BstaticAnalysis[Static Analysis]
B --> CtypeChecking[Type Checking]
C --> D[Linting]
D --> EunitTesting[Unit Testing]
E --> FintegrationTesting[Integration Testing]
F --> Gend-to-endTesting[End-to-End Testing]
G --> HperformanceTesting[Performance Testing]
H --> IloadTesting[Load Testing]
I --> JqualityGates[Quality Gates]
J --> K{All Passed?}
K -->|Yes| LreleaseReady[Release Ready]
K -->|No| MissueResolution[Issue Resolution]
M --> A
subgraph "Automated Checks"
N[mypy] -.-> C
O[ruff] -.-> D
P[black] -.-> D
Q[isort] -.-> D
end
subgraph "Test Categories"
R[pytest] -.-> E
SrealDataOnly[Real Data Only] -.-> E
TnoMocks[No Mocks] -.-> E
U[Integration] -.-> F
end
subgraph "Quality Metrics"
Vcoverage>90%[Coverage >90%] -.-> J
WzeroCriticalIssues[Zero Critical Issues] -.-> J
XperformanceBenchmarks[Performance Benchmarks] -.-> J
YdocumentationComplete[Documentation Complete] -.-> J
end
graph TD
AtestSuite[Test Suite] --> BenvironmentSetup[Environment Setup]
B --> CdependencyVerification[Dependency Verification]
C --> DtestDiscovery[Test Discovery]
D --> EtestCollection[Test Collection]
E --> F{Test Category}
F -->|Unit| GfastExecution[Fast Execution]
F -->|Integration| HworkflowTesting[Workflow Testing]
F -->|E2E| IfullPipeline[Full Pipeline]
G --> JparallelExecution[Parallel Execution]
H --> J
I --> J
J --> KresultCollection[Result Collection]
K --> LcoverageAnalysis[Coverage Analysis]
L --> MreportGeneration[Report Generation]
M --> N{Quality Standards Met?}
N -->|Yes| OqualityAssurancePass[Quality Assurance Pass]
N -->|No| PfailureAnalysis[Failure Analysis]
P --> QissueClassification[Issue Classification]
Q --> RfixImplementation[Fix Implementation]
R --> S[Re-testing]
S --> N
subgraph "Execution Environment"
TuvVenv[uv venv] -.-> B
U[Dependencies] -.-> C
VtestData[Test Data] -.-> C
WexternalTools[External Tools] -.-> C
end
subgraph "Test Organization"
Xtests/test*.py[tests/test_*.py] -.-> D
Y[Domain-specific] -.-> F
ZintegrationTests[Integration Tests] -.-> F
AAe2eTests[E2E Tests] -.-> F
end
subgraph "Quality Metrics"
BBlineCoverage[Line Coverage] -.-> L
CCbranchCoverage[Branch Coverage] -.-> L
DDmutationTesting[Mutation Testing] -.-> L
EEperformanceBenchmarks[Performance Benchmarks] -.-> L
end
ABSOLUTE PROHIBITION: Never use fake/mocked/stubbed methods, objects, or network shims in source code or tests.
Source Code Policy: All production functions must perform real computations or make real API calls. NO DUMMY DATA RETURNS. Placeholder implementations that return hardcoded values are strictly prohibited.
Real Implementation Only: All code must exercise real algorithms and external behavior:
- Networked tests: perform real HTTP requests with short timeouts. If offline, skip gracefully with clear messages.
- CLI-dependent tests (e.g., amalgkit): run only when the dependency is available on PATH; otherwise skip with dependency notes.
- Database tests: use real database connections or skip when unavailable.
- API tests: make real API calls or skip when network/credentials unavailable.
Anti-Pattern Enforcement: Mocking is an anti-pattern that creates brittle tests disconnected from reality.
Quality Assurance: Real implementations reveal actual bugs, performance issues, and integration problems.
Environment Setup: It is acceptable to set environment variables for test setup, but do not monkeypatch or replace functions.
Test Artifacts: Tests must write all artifacts only under output/ directory.
Reproducibility: Prefer deterministic seeds and stable filenames for reproducible test runs.
Clear Documentation: When external dependencies are unavailable, tests must clearly document what is being skipped and why.
Before running tests, ensure your test environment is properly set up:
# Setup test environment (recommended first step)
bash scripts/package/verify.sh --mode deps
# Setup for specific test types
bash scripts/package/verify.sh --mode deps --test-type fast # Core tests only
bash scripts/package/verify.sh --mode deps --test-type network # Include network tests
bash scripts/package/verify.sh --mode deps --test-type all # Full test suite
# Verify setup without installing
bash scripts/package/verify.sh --mode deps --verify-onlyMETAINFORMANT uses UV dependency groups for different test scenarios:
test-fast: Core functionality tests (minimal dependencies)test-network: Include network-dependent teststest-external: Include external CLI tool teststest-all: Full test suite with all optional dependencies
# Run all tests with minimal output
uv run pytest -q
# Run with coverage
uv run pytest --cov=src/metainformant --cov-report=html
# Run specific test files
uv run pytest tests/test_core_text.py -vNote: Test scripts automatically detect FAT filesystems and configure UV cache and virtual environment locations accordingly. On FAT filesystems, tests use /tmp/metainformant_venv if available. The setup script (scripts/package/setup.sh) shows real-time test progress with verbose output, making it easy to see which tests are running. See UV Setup Guide for details.
Use the comprehensive test runner script:
# Standard test run (automatically handles FAT filesystems)
bash scripts/package/test.sh
# Fast tests only (skip slow/network/external dependencies)
bash scripts/package/test.sh fast
# Optimized test runner
bash scripts/package/test.sh --mode fast
# Include network tests
bash scripts/package/test.sh network
# Generate coverage reports
bash scripts/package/test.sh coverage
# Run tests matching a pattern
bash scripts/package/test.sh allFAT Filesystem Support: All test scripts automatically detect FAT filesystems and use appropriate venv locations (/tmp/metainformant_venv on FAT, .venv on standard filesystems).
Tests are organized with custom pytest markers:
slow: Tests that take significant time to completenetwork: Tests requiring internet connectivityexternal_tool: Tests requiring external CLI tools (e.g., amalgkit)integration: End-to-end integration tests
Test configuration is managed in pyproject.toml:
[tool.pytest.ini_options]
timeout = 10 # Default 10-second timeout for all tests
addopts = ["--cov=src/metainformant"]
markers = [
"slow: marks tests as slow",
"network: marks tests as requiring network access",
"external_tool: marks tests as requiring external tools",
"integration: marks tests as integration tests",
]
[tool.coverage.run]
branch = true
omit = ["*/tests/*", "*/__pycache__/*"]
[tool.coverage.report]
exclude_lines = [
"pragma: no cover",
"def __repr__",
"raise AssertionError",
"raise NotImplementedError",
]Tests have a default 10-second timeout to prevent hanging. Slow tests can be marked with @pytest.mark.slow and run separately:
# Run fast tests only (default)
uv run pytest tests/
# Run all tests including slow ones
uv run pytest tests/ --runslow
# Run only slow tests
uv run pytest tests/ -m slow
# Skip network tests
uv run pytest tests/ -m "not network"
# Skip external tool tests
uv run pytest tests/ -m "not external_tool"
# Custom timeout for specific runs
uv run pytest tests/ --timeout=30tests/: All test files at repository roottests/conftest.py: Shared pytest configuration and fixturestests/test_*.py: Individual test modulestests/data/: Test input fixtures and dataoutput/: All test outputs and artifacts
The test suite provides comprehensive coverage:
| Module | Primary Tests | Coverage |
|---|---|---|
| Core Utilities | test_core_*.py |
Config, I/O, text, logging, parallel, paths, cache, hash, db |
| DNA Analysis | test_dna_*.py |
Sequences, alignment, MSA, phylogeny, population genetics, FASTQ |
| RNA Analysis | test_rna_*.py |
amalgkit wrapper, workflow, configs, step runners |
| Single-Cell | test_singlecell_*.py |
Preprocessing, dimensionality, clustering, trajectory |
| Quality Control | test_quality_*.py |
FASTQ quality analysis, contamination detection |
| Protein Analysis | test_protein_*.py |
UniProt, PDB, AlphaFold, InterPro, structure analysis |
| Mathematical Models | test_math_*.py |
Population genetics, coalescent, epidemiology, selection |
| Simulation | test_simulation.py |
Sequence generation, RNA counts, agent-based models |
| Visualization | test_visualization_*.py |
Plots, trees, animations |
| Ontology | test_ontology_*.py |
Gene Ontology, OBO parsing |
| CLI Interface | test_cli.py |
Command-line argument parsing and dispatch |
- One test file per source module:
src/foo/bar.py→tests/test_foo_bar.py - Class-based organization: Group related tests in classes
- Descriptive test names: Use clear, descriptive function names
- Setup/teardown: Use
setup_methodandteardown_methodfor clean environments - Independent tests: Each test should be runnable in isolation
- Output directory: All test artifacts go to
output/subdirectories
import pytest
from metainformant.core.io import ensure_directory
from metainformant.some_module import some_function
class TestSomeFunction:
def setup_method(self):
"""Setup test environment."""
self.output_dir = ensure_directory("output/test_some_module")
self.test_data = {...}
def teardown_method(self):
"""Cleanup after test."""
# Cleanup if needed (usually not required)
pass
def test_basic_functionality(self):
"""Test basic function behavior."""
result = some_function(self.test_data)
assert result is not None
def test_edge_cases(self):
"""Test edge cases and error conditions."""
with pytest.raises(ValueError):
some_function(invalid_input)
@pytest.mark.slow
def test_performance_intensive(self):
"""Test that takes significant time."""
# Test implementation
pass
@pytest.mark.network
def test_network_dependent(self):
"""Test requiring network access."""
# Skip if offline
try:
result = some_network_function()
assert result is not None
except ConnectionError:
pytest.skip("Network not available")Network-dependent tests use real API calls with graceful offline handling:
import requests
import pytest
def _check_online():
"""Check if network connectivity is available."""
try:
response = requests.get('https://httpbin.org/status/200', timeout=5)
return response.status_code == 200
except requests.RequestException:
return False
@pytest.mark.network
@pytest.mark.skipif(not _check_online(), reason="Network not available")
def test_uniprot_api():
"""Test UniProt API with real network call."""
from metainformant.protein.uniprot import map_ids_uniprot
result = map_ids_uniprot(['P12345'], from_db='UniProtKB_AC-ID', to_db='Gene_Name')
assert len(result) >= 0 # May be empty if ID not foundTests requiring external CLI tools check for availability:
import shutil
import subprocess
import pytest
@pytest.mark.external_tool
@pytest.mark.skipif(not shutil.which("amalgkit"), reason="amalgkit not available")
def test_amalgkit_execution():
"""Test amalgkit CLI tool integration."""
result = subprocess.run(['amalgkit', '--version'], capture_output=True, text=True)
assert result.returncode == 0Some tests require environment variables for configuration:
# For NCBI tests
export NCBI_EMAIL="your.email@example.com"
# For database tests
export TEST_DATABASE_URL="sqlite:///output/test.db"
# Run tests with environment
./scripts/package/test.sh --mode network- Small test data: Include directly in
tests/data/ - Generated test data: Create programmatically in
setup_method - Large test data: Download during test execution (with caching)
- Consistent paths: Use
metainformant.core.io.ensure_directory - Deterministic names: Use consistent, predictable filenames
- Cleanup policy: Generally leave outputs for inspection (they're in
output/)
def setup_method(self):
"""Setup test environment with data."""
self.output_dir = ensure_directory("output/test_analysis")
# Create test data
self.test_sequences = [
"ATCGATCGATCG",
"GCTAGCTAGCTA",
"TTTTAAAACCCC"
]
# Write test FASTA
self.test_fasta = f"{self.output_dir}/test.fasta"
with open(self.test_fasta, 'w') as f:
for i, seq in enumerate(self.test_sequences):
f.write(f">seq_{i}\n{seq}\n")The test suite is designed for CI/CD environments:
# Example GitHub Actions
- name: Run fast tests
run: ./scripts/package/test.sh --mode fast
- name: Run network tests
run: ./scripts/package/test.sh --mode network
if: env.ENABLE_NETWORK_TESTS == 'true'
- name: Generate coverage report
run: ./scripts/package/test.sh --mode coverage- Default CI: Run fast tests only
- Nightly builds: Include slow and network tests
- Release testing: Full test suite including integration tests
- Fast tests: < 1 second per test
- Slow tests: Marked with
@pytest.mark.slow - Parallel execution: Tests designed for parallel execution
- Resource usage: Tests avoid excessive memory/CPU usage
@pytest.mark.slow
def test_large_dataset_processing(self):
"""Test with large synthetic dataset."""
# Generate large test data efficiently
large_data = generate_synthetic_data(n_samples=100000)
# Test processing
result = process_large_data(large_data)
assert len(result) == len(large_data)- Write failing test: Implement test for new functionality
- Implement feature: Write minimal code to pass test
- Refactor: Improve implementation while maintaining test passage
- Add edge cases: Test error conditions and boundary cases
# Run tests for specific module during development
uv run pytest tests/test_core_text.py -v --tb=short
# Watch mode for continuous testing (with external tool)
# uv pip install pytest-watch
ptw tests/test_core_text.py
# Run with immediate failure reporting
uv run pytest -x tests/test_core_text.py❌ uv is not installed or not in PATH
Please install uv first:
curl -LsSf https://astral.sh/uv/install.sh | sh
Or visit: https://github.com/astral.sh/uvSolution: Install uv and ensure it's in your PATH:
curl -LsSf https://astral.sh/uv/install.sh | sh
export PATH="$HOME/.cargo/bin:$PATH"❌ Failed to sync dependenciesSolution: Clear UV cache and retry:
rm -rf .uv-cache/
bash scripts/package/verify.sh --mode deps📁 FAT filesystem detected - using /tmp locationsNote: This is normal on FAT filesystems (exFAT, FAT32). The system automatically uses /tmp for virtual environments and caches.
❌ pytest not availableSolution: Setup test environment:
bash scripts/package/verify.sh --mode deps --test-type fast⚠️ Network connectivity check failedNote: Network tests require internet access. They will be skipped gracefully if offline.
⚠️ amalgkit not available - some tests may be skippedNote: External tool tests require specific CLI tools. Install them or tests will be skipped.
ModuleNotFoundError: No module named 'metainformant'
Solution: Ensure you're running tests from the repository root and PYTHONPATH is set:
cd /path/to/metainformant
export PYTHONPATH="$PWD/src:$PYTHONPATH"
uv run pytest tests/PermissionError: [Errno 13] Permission denied
Solution: Ensure output directories are writable:
chmod -R u+w output/MemoryError: Out of memory
Solution: Run fewer tests in parallel or increase system memory:
uv run pytest tests/test_core_*.py # Run smaller test sets