JLSteenwyk/orthosnap
a tree splitting and pruning algorithm for retrieving single-copy orthologs from gene family trees
Docs · Report Bug · Request Feature
OrthoSNAP is a tree splitting and pruning tool for retrieving single-copy orthologous subgroups (SNAP-OGs) from larger gene families.
If you found OrthoSNAP useful, please cite:
OrthoSNAP: a tree splitting and pruning algorithm for retrieving single-copy orthologs from gene family trees. Steenwyk et al. 2022, PLOS Biology. DOI: 10.1371/journal.pbio.3001827.
Full usage documentation and tutorial:
https://jlsteenwyk.com/orthosnap/
What's new in v1.6.0
Compared to v1.5.0 (plotting + performance improvements), v1.6.0 adds workflow-scale and reproducibility features:
--manifest: batch execution from TSV/CSV manifests.--validate-only: preflight input concordance checks without extraction.--structured-output: machine-readable run metadata (.run.json) and subgroup summaries (.subgroups.tsv).--occupancy-count/--occupancy-fraction: explicit occupancy semantics.--resume: skip rerunning completed analyses.--bootstrap-trees+--consensus-min-frequency+--consensus-trees: consensus subgrouping across bootstrap tree uncertainty.
Compared to older releases:
- v1.5.0 focused on plotting and runtime optimization.
- v1.3.2 introduced configurable delimiters.
- v1.2.0 added inparalog handling reports.
- v1.0.0 and earlier focused on core pruning behavior.
Installation
Install with pip (recommended)
python -m venv .venv
source .venv/bin/activate
pip install orthosnapInstall from source
git clone https://github.com/JLSteenwyk/orthosnap.git
cd orthosnap
python -m venv .venv
source .venv/bin/activate
make installInstall with conda
conda install -c jlsteenwyk orthosnapConda package details:
https://anaconda.org/jlsteenwyk/orthosnap
Quick start
orthosnap -f orthogroup_of_genes.faa -t phylogeny_of_orthogroup_of_genes.treGenerate a color-coded SNAP-OG assignment plot for the full tree:
orthosnap -f orthogroup_of_genes.faa -t phylogeny_of_orthogroup_of_genes.tre -psChoose plot format (png default, pdf or svg):
orthosnap -f orthogroup_of_genes.faa -t phylogeny_of_orthogroup_of_genes.tre -ps -pf svgShow all CLI options:
orthosnap -hRun validation checks only (no subgroup extraction):
orthosnap -f orthogroup_of_genes.faa -t phylogeny_of_orthogroup_of_genes.tre --validate-onlyWrite structured provenance outputs (.run.json and .subgroups.tsv):
orthosnap -f orthogroup_of_genes.faa -t phylogeny_of_orthogroup_of_genes.tre --structured-outputResume an interrupted or previously completed run:
orthosnap -f orthogroup_of_genes.faa -t phylogeny_of_orthogroup_of_genes.tre --resumeUse explicit occupancy semantics:
orthosnap -f orthogroup_of_genes.faa -t phylogeny_of_orthogroup_of_genes.tre --occupancy-count 5
orthosnap -f orthogroup_of_genes.faa -t phylogeny_of_orthogroup_of_genes.tre --occupancy-fraction 0.5Run many orthogroups from a manifest (TSV/CSV with tree and fasta columns):
orthosnap --manifest runs.tsv --structured-output -op results/Run bootstrap consensus mode using a file of tree paths (one per line):
orthosnap -f orthogroup_of_genes.faa -t reference.treefile --bootstrap-trees bootstrap_paths.txt --consensus-min-frequency 0.5Also write consensus Newick trees:
orthosnap -f orthogroup_of_genes.faa -t reference.treefile --bootstrap-trees bootstrap_paths.txt --consensus-treesSupport
If installation fails in a clean virtual environment, contact Jacob L. Steenwyk via:
- Email: https://jlsteenwyk.com/contact.html
- Twitter/X: https://twitter.com/jlsteenwyk