Note: Most module functionality is accessed via Python imports, not the CLI. Currently implemented CLI commands:
--version,--modules,--help,proteinsubcommands,rnasubcommands, andgwassubcommands. Commands listed below for other modules (ontology, phenotype, networks, multiomics, singlecell, quality, simulation, visualization, epigenome, ecology, ml, information, life-events, longread, metagenomics, structural-variants, spatial, pharmacogenomics, metabolomics, menu) represent planned CLI features not yet implemented.
Entry: uv run python -m metainformant or uv run metainformant.
uv run metainformant setup --with-amalgkit --ncbi-email "you@example.com"
uv run metainformant dna fetch --assembly GCF_000001405.40
uv run metainformant rna plan --work-dir output/amalgkit/work --threads 8 --species Apis_mellifera
uv run metainformant rna plan-species --work-dir output/amalgkit/work --threads 8 --taxon-id 7460 --tissue brain --tissue muscle
uv run metainformant rna plan-config --config config/amalgkit/amalgkit_pogonomyrmex_barbatus.yaml
uv run metainformant rna run --work-dir output/amalgkit/work --threads 8 --species Apis_mellifera --check
uv run metainformant rna run-config --config config/amalgkit/amalgkit_pogonomyrmex_barbatus.yaml --check
uv run metainformant gwas run --config config/gwas/gwas_template.yaml
uv run metainformant gwas run --config config/gwas/gwas_template.yaml --check
uv run metainformant protein taxon-ids --file tests/data/protein/taxon_id_list.txt
uv run metainformant protein comp --fasta data/protein/example.faa
uv run metainformant protein rmsd-ca --pdb-a file1.pdb --pdb-b file2.pdb
uv run metainformant math selection --help
uv run metainformant ontology run --go data/go.obo --output output/ontology
uv run metainformant phenotype run --input data/phenotypes.json --output output/phenotype
uv run metainformant networks run --input data/interactions.tsv --output output/networks
uv run metainformant multiomics run --genomics data/genomics.tsv --transcriptomics data/rna.tsv --output output/multiomics
uv run metainformant singlecell run --input data/counts.h5ad --output output/singlecell --qc --normalize
uv run metainformant quality run --fastq data/reads.fq --output output/quality --analyze-fastq
uv run metainformant simulation run --model sequences --output output/simulation --n 1000
uv run metainformant visualization run --input data/results.csv --plot-type heatmap --output output/visualization
uv run metainformant epigenome run --methylation data/methylation.tsv --output output/epigenome --compute-beta
uv run metainformant ecology run --input data/abundance.tsv --output output/ecology --diversity
uv run metainformant ml run --features data/features.csv --labels data/labels.csv --output output/ml --classify
uv run metainformant information entropy --input data/sequences.fasta --k 1
uv run metainformant information mutual-information --x data/variable1.csv --y data/variable2.csv --output output/information
uv run metainformant information profile --sequences data/sequences.fasta --k 2 --visualize
uv run metainformant life-events embed --input data/event_sequences.json --output output/life_events/embeddings --embedding-dim 100
uv run metainformant life-events predict --events data/event_sequences.json --model output/life_events/model.pkl --output output/life_events/predictions
uv run metainformant life-events interpret --model output/life_events/model.pkl --sequences data/event_sequences.json --output output/life_events/interpretation
uv run metainformant longread run --input data/reads.fastq --output output/longread --assembler flye
uv run metainformant metagenomics run --input data/metagenome.fastq --output output/metagenomics --profile
uv run metainformant structural-variants run --bam data/aligned.bam --output output/structural_variants --detect-sv
uv run metainformant spatial run --input data/spatial_counts.h5ad --output output/spatial --tissue-map
uv run metainformant pharmacogenomics run --vcf data/variants.vcf --output output/pharmacogenomics --drug-interactions
uv run metainformant metabolomics run --input data/mzml/sample.mzML --output output/metabolomics --identify
uv run metainformant menu
uv run metainformant tests -q
Subcommands
- setup: runs repository setup (uv, dependencies); supports
--with-amalgkitand--ncbi-emailoptions - dna fetch: validates assembly accessions (see DNA Accessions)
- rna plan: prints an ordered plan of subcommands and parameters (see RNA Workflow)
- rna plan-species: plans workflow with species/tissue parameters; requires
--work-dirand--threads; optional--taxon-idand repeatable--tissuefilters - rna plan-config: plans workflow from a config file without executing; requires
--configpath to YAML/TOML/JSON file - rna run: executes the workflow; use
--checkto stop on first failure; logs written inwork-dir/logs(default examples place this underoutput/) - rna run-config: executes the workflow from a config file under
config/; logs and manifest written under paths specified by the config - gwas run: executes GWAS workflow from configuration file (see GWAS Workflow); use
--checkto validate configuration without execution - protein taxon-ids: reads and prints taxon IDs from file (see Protein Proteomes)
- protein comp: calculates amino acid composition for sequences in FASTA
- protein rmsd-ca: computes Kabsch RMSD using CA atoms from two PDB files
- math selection: selection model experiments and visualizations (see Math Selection)
- ontology run: ontology analysis workflow (GO term queries, enrichment analysis, ontology summaries)
- Options:
--go(OBO file),--output(default: output/ontology),--query-term,--ancestors,--descendants
- Options:
- phenotype run: phenotype analysis workflow (trait statistics, correlations, AntWiki data integration)
- Options:
--input(required, JSON/CSV/TSV),--output(default: output/phenotype),--analyze-statistics,--analyze-correlations
- Options:
- networks run: network analysis workflow (metrics, community detection, centrality measures)
- Options:
--input(required, edge list),--output(default: output/networks),--analyze-metrics,--detect-communities,--analyze-centrality
- Options:
- multiomics run: multi-omics integration workflow (joint PCA, NMF, CCA across genomics/transcriptomics/proteomics)
- Options:
--genomics,--transcriptomics,--proteomics,--output(default: output/multiomics),--joint-pca,--joint-nmf,--canonical-correlation
- Options:
- singlecell run: single-cell analysis workflow (QC, normalization, dimensionality reduction, clustering)
- Options:
--input(required, count matrix),--output(default: output/singlecell),--qc,--normalize,--cluster
- Options:
- quality run: quality control workflow (FASTQ metrics, contamination detection)
- Options:
--fastq,--output(default: output/quality),--analyze-fastq,--detect-contamination
- Options:
- simulation run: simulation workflow (synthetic sequences, agent-based models, expression simulation)
- Options:
--model(required: sequences/agents/expression),--output(default: output/simulation),--n(sequences count),--steps(simulation steps)
- Options:
- visualization run: visualization workflow (publication-quality plots, heatmaps, animations, histograms)
- Options:
--input(required, data file),--plot-type(required: lineplot/heatmap/animation/histogram),--output(default: output/visualization)
- Options:
- epigenome run: epigenome analysis workflow (DNA methylation patterns, chromatin accessibility tracks)
- Options:
--methylation(CpG table),--bedgraph(track file),--output(default: output/epigenome),--compute-beta
- Options:
- ecology run: ecology analysis workflow (community diversity metrics, species richness, beta diversity)
- Options:
--input(required, abundance table),--output(default: output/ecology),--diversity,--beta-diversity
- Options:
- ml run: machine learning pipeline workflow (feature selection, classification, regression, validation)
- Options:
--features(required, feature matrix),--labels(optional),--output(default: output/ml),--classify,--regress,--feature-selection
- Options:
- information entropy: calculates Shannon entropy for sequences or data files (see Information Theory)
- information mutual-information: calculates mutual information between two variables/data files
- information profile: calculates information profile for sequences with optional visualization
- life-events embed: learns event embeddings from life course event sequences (see Life Events)
- life-events predict: predicts life outcomes from event sequences using pre-trained models
- life-events interpret: interprets model predictions and provides feature importance analysis
- longread run: long-read sequencing analysis workflow (assembly, error correction, quality metrics)
- Options:
--input(required, FASTQ/FASTA),--output(default: output/longread),--assembler(flye/canu),--error-correct
- Options:
- metagenomics run: metagenomic analysis workflow (taxonomic profiling, functional annotation, community analysis)
- Options:
--input(required, FASTQ),--output(default: output/metagenomics),--profile,--functional,--diversity
- Options:
- structural-variants run: structural variant detection workflow (SV/CNV calling, breakpoint resolution)
- Options:
--bam(required, aligned BAM),--output(default: output/structural_variants),--detect-sv,--detect-cnv
- Options:
- spatial run: spatial transcriptomics workflow (tissue mapping, spatial statistics, neighborhood analysis)
- Options:
--input(required, count matrix),--output(default: output/spatial),--tissue-map,--spatial-stats
- Options:
- pharmacogenomics run: pharmacogenomics workflow (drug-gene interactions, variant interpretation)
- Options:
--vcf(required, variant file),--output(default: output/pharmacogenomics),--drug-interactions,--star-alleles
- Options:
- metabolomics run: metabolomics analysis workflow (MS data processing, metabolite identification, pathway mapping)
- Options:
--input(required, mzML/mzXML),--output(default: output/metabolomics),--identify,--pathway-map
- Options:
- menu: launches interactive CLI menu for workflow discovery and navigation
- tests: runs the repo tests (see Testing)
sequenceDiagram
participant U as User
participant CLI as __main__.py
participant DNA as dna/*
participant RNA as rna/*
U->>CLI: uv run metainformant rna plan --work-dir W
CLI->>RNA: plan_workflow(config)
RNA-->>CLI: steps: (name,Params)...[(name, params)...]
U->>CLI: uv run metainformant rna run --work-dir W --check
CLI->>RNA: execute_workflow(config, check=True)
RNA->>RNA: run_amalgkit(step, params)
RNA-->>CLI: return codes
See: RNA Workflow, DNA, GWAS Workflow, Testing.
Interactive terminal-based tools for pipeline monitoring and workflow execution:
scripts/rna/monitor_tui.py— Real-time dashboard showing per-species pipeline progress, system metrics (CPU, RAM, network I/O), and active command counts. Refresh rate: 5s.python scripts/rna/monitor_tui.py
scripts/rna/run_workflow_tui.py— Full workflow runner with TUI visualization. Runs the complete RNA-seq workflow (download → getfastq → quant) with per-sample progress display.python scripts/rna/run_workflow_tui.py --config config/amalgkit/amalgkit_pogonomyrmex_barbatus.yaml --threads 5
The RNA-seq pipeline uses a specific directory hierarchy under each species output:
output/amalgkit/<species>/
├── work/ # Amalgkit intermediate files (metadata, selected samples, merge outputs)
│ ├── metadata/ # Sample metadata from NCBI (metadata_selected.tsv)
│ ├── getfastq/ # Symlinks to downloaded FASTQ files
│ ├── quant/ # Kallisto quantification results per sample
│ ├── merge/ # Combined abundance matrices
│ └── logs/ # Step-level log files
├── fastq/ # Raw downloaded FASTQ files (ENA .fastq.gz or SRA extracts)
│ └── getfastq/ # Per-sample subdirectories with FASTQ pairs
└── genome/ # Reference genome FASTA + Kallisto index
Key distinction:
work/getfastq/contains symlinks pointing tofastq/getfastq/where actual FASTQ data resides. This separation allows cleanup of raw data while preserving workspace structure.
Many workflows support configuration files for complex parameter sets:
- RNA workflows: Use
config/amalgkit/*.yamlfiles (see RNA Workflow) - GWAS workflows: Use
config/gwas/*.yamlfiles (see GWAS Workflow) - Network analysis: Template available at
config/networks/networks_template.yaml - Multi-omics: Template available at
config/multiomics/multiomics_template.yaml - Single-cell: Template available at
config/singlecell/singlecell_template.yaml
See Configuration Management for details on using configuration files and environment variable overrides.