JLSteenwyk/orthofisher
a broadly applicable tool for automated gene identification and retrieval
Docs · Report Bug · Request Feature
Orthofisher conducts automated and high-throughout identification of a predetermined set of orthologs, which can be used for phylgenomics, gene family copy number determination and more!
If you found orthofisher useful, please cite orthofisher: a broadly applicable tool for automated gene identification and retrieval. Steenwyk & Rokas 2021, G3 Genes|Genomes|Genetics. doi: 10.1093/g3journal/jkab250.
Guide
Quick Start
Performance Assessment
FAQ
Quick Start
For detailed instructions on usage and a tutorial, please see the online documentation.
1) Prerequisite
Before installing orthofisher, please first install HMMER3 and add the HMMER to your .bashrc path. For example, my .bashrc has the following:
export PATH=$PATH:/home/steenwj/SOFTWARE/hmmer-3.1b2-linux-intel-x86_64/binaries2) Install orthofisher
Supported Python versions: 3.10, 3.11, 3.12, and 3.13.
If you are having trouble installing orthofisher, please contact the lead developer, Jacob L. Steenwyk, via email or twitter to get help.
To install via anaconda, execute the follwoing command:
conda install -c jlsteenwyk orthofisherVisit here for more information: https://anaconda.org/jlsteenwyk/orthofisher
To install via pip, execute the follwoing command:
pip install orthofisherTo install from source, execute the follwoing command:
# download
git clone https://github.com/JLSteenwyk/orthofisher.git
# change dir
cd orthofisher/
# install
make installIf you run into software dependency issues, install orthofisher in a virtual environment. To do so, create your virtual environment with the following command:
# create virtual environment
python -m venv .venv
# activate virtual environment
source .venv/bin/activateNext, install the software using your preferred method above. Thereafter, you will be able to use orthofisher.
To deactivate your virtual environment, use the following command:
# deactivate virtual environment
deactivateNote, the virtual environment must be activated to use orthofisher.
3) Run orthofisher
orthofisher -m hmms.txt -f fasta_arg.txtTo query nucleotide sequences with nucleotide HMMs, use:
orthofisher -m hmm_nucl.txt -f fasta_nucl.txt --seq-type nucleotideThe default --seq-type auto mode infers the tool per HMM from the ALPH header:
ALPH DNA/ALPH RNA->nhmmer- other alphabets (for example amino-acid models) ->
hmmsearch
By default, orthofisher now writes a slim output:
scog/long_summary.txtshort_summary.txt
To additionally write larger raw outputs (all_sequences/ and hmmsearch_output/), use:
orthofisher -m hmms.txt -f fasta_arg.txt --verbose-outputIf your output directory already exists, orthofisher will stop with an error.
To overwrite the existing output directory, add:
orthofisher -m hmms.txt -f fasta_arg.txt --forceYou can combine both flags:
orthofisher -m hmms.txt -f fasta_arg.txt --verbose-output --forceTo continue an interrupted run in an existing output directory, use:
orthofisher -m hmms.txt -f fasta_arg.txt -o orthofisher_output --resume--resume reuses checkpoint state, skips completed FASTA/HMM pairs, and rewrites
short_summary.txt from the resumed totals.
Performance Assessment
Using 1,530 sequence similarity searches across six model eukaryotic proteomes, the performance of orthofisher was compared to results obtained from BUSCO. Examination of precision and recall revealed near perfect performance. More specifically, orthofisher had a recall of 1.0 and precision of 0.99. Precision is less than 1.0 because priors of expected sequence length and sequence similarity scores--which are not implemented in orthofisher--resulted in more missing genes in the BUSCO pipeline than the orthofisher pipeline.
FAQ
I am having trouble installing orthofisher, what should I do?
Please install orthofisher using a virtual environment as described in the installation instructions. If you are still running into issues after installing in a virtual environment, please contact Jacob L. Steenwyk via email or twitter.