zaixizhang/RNAGenesis
RNAGenesis: A Generalist Foundation Model for Functional RNA Therapeutics
RNAGenesis: A Generalist Foundation Model for Functional RNA Therapeutics
π Overview
RNAGenesis is a compact yet powerful RNA foundation model that unifies sequence understanding, de novo RNA design, and 3D structure prediction through a latent diffusion framework.
π Key Features
-
π Inference-time optimization
Introduces test-time directed generation strategiesβcombining tree search and gradient-based model guidanceβto steer RNA design toward desired structure and function. -
π State-of-the-art performance
Achieves top results in 11 of 13 tasks on the BEACON benchmark for RNA sequence understanding. -
𧬠Versatile RNA generation
Synthesizes diverse non-coding RNAs, including natural-like aptamers and structurally optimized CRISPR sgRNAs. -
π§ͺ Experimental validation
RNAGenesis-designed sgRNAs outperform wild-type scaffolds in gene knockout efficiency, with up to 2Γ improvement across CRISPR-Cas9, base editing, and prime editing platforms.
π Results
CRISPR sgRNA Design and Wet-lab Validation
π οΈ Installation
Option 1: Install via conda yaml file
# Create and activate conda environment
conda env create -f environment.yml
conda activate rnagenesisOption 2: Install via Conda and Pip
# Create and activate conda environment
conda create -n rnagenesis python=3.8.13
conda activate rnagenesis
# Install dependencies
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2
pip install transformers==4.32.1 diffusers==0.25.0 accelerate==0.25.0 logomaker==0.8 biopython==1.83 sentencepiece==0.1.99 huggingface-hub==0.28.1 wandb==0.19.6 pytorch-lightning==2.1.3 torchmetrics==1.2.1 xgboost==1.5.2 omegaconf==2.3.0
conda install numpy=1.22.0 pandas=1.3.1 scipy=1.10.1 matplotlib=3.7.1 seaborn=0.13.2 scikit-learn=0.24.0 jupyterlab=2.3.2 ipython=8.3.0 ipykernel=6.13.0 openssl=1.1.1o zlib=1.2.11 ca-certificates=2021.10.8 setuptools=59.5.0 wheel=0.37.1
pip install rna-fm
conda install -c bioconda viennarnaπ₯ Download and Extract Model Weights
# Create checkpoints directory
mkdir -p checkpoints
mkdir -p configs
# Download model weights
wget https://zenodo.org/records/15203813/files/checkpoints.zip?download=1
wget https://zenodo.org/records/15203813/files/configs.zip?download=1
wget https://zenodo.org/records/15203813/files/progen2-base.zip?download=1
wget https://zenodo.org/records/15203813/files/progen2-small.zip?download=1
# Extract model weights
unzip checkpoints.zip -d checkpoints/
unzip configs.zip -d confiigs/
unzip progen2-base.zip -d models/autoencoder/decoder/checkpoints/progen2-base/
unzip progen2-small.zip -d models/autoencoder/decoder/checkpoints/progen2-small/
# Clean up
rm -f checkpoints.zip
rm -f configs.zip
rm -f progen2-base.zip
rm -f progen2-small.zipπ Inference Pipeline
Inference Steps
- Generation with RNAGensis:
# RNAGenesis python generate.py \ --batch_size 128 \ --batch_num 200 \ --eos_token "2" \ --do_sample \ --top_p 0.95 \ --top_k 0 \ --max_seq_len 37 \ --enc_dec_file "configs/rnagenesis/autoencoder" \ --dm_file "checkpoints/Aptamer/diffusion" \ # for aptamer generation --superfolder "generation_sequences" \ --mid_folder "RNAGenesis_Aptamer"
# RNAGenesis python generation.py \ --batch_size 128 \ --batch_num 200 \ --eos_token "2" \ --do_sample \ --top_p 0.95 \ --top_k 0 \ --max_seq_len 64 \ --enc_dec_file "configs/rnagenesis/autoencoder" \ --dm_file "checkpoints/sgRNA/diffusion" \ # for sgRNA generation --superfolder "generation_sequences" \ --mid_folder "RNAGenesis_sgRNA"
- Generation with Guidance RNAGenesis:
# guidance RNAGenesis python generation.py \ --batch_size 128 \ --batch_num 200 \ --eos_token "2" \ --do_sample \ --top_p 0.95 \ --top_k 0 \ --max_seq_len 64 \ --enc_dec_file "configs/rnagenesis/autoencoder" \ --dm_file "checkpoints/sgRNA/diffusion" \ # for sgRNA generation --guidance \ --target_class 0 \ --guidance_classifier_model_config "configs/rangenesis/classifier/mlp_160_32.yaml" \ --classifier_loss_type 'ce' \ --guidance_scale 50.0 \ --recurrence_step 1 \ --superfolder "generation_sequences" \ --mid_folder "Guid_sgRNA"
- Generation with Beam-Search RNAGensis:
# tree search RNAGenesis python generation.py \ --batch_size 128 \ --batch_num 200 \ --eta 1 \ --search_general \ --search_goal "similarity" \ --active_size 1 \ --branch_size 8 \ --eos_token "2" \ --do_sample \ --top_p 0.95 \ --top_k 0 \ --max_seq_len 37 \ --enc_dec_file "configs/rnagenesis/autoencoder" \ --dm_file "checkpoints/sgRNA/diffusion" \ # for sgRNA generation --superfolder "generation_sequences" \ --mid_folder "BS_sgRNA"
It takes around 5 hours to generate all the sequences on 1 A100 GPU.
π Encoder Checkpoint for Embeddings
We also release the RNAGenesis encoder checkpoint on Hugging Face, enabling direct extraction of RNA embeddings for downstream applications such as sequence classification, clustering, and functional annotation. This allows researchers to leverage RNAGenesis not only for generation but also as a universal RNA representation model that can be integrated into diverse pipelines.
Examples of Generated Scaffolds by RNAGenesis Aligned with Wildtype
π Citation
If you find this work helpful, please cite our paper:
@article{zhang2024rna,
title={RNAGenesis: Foundation Model for Enhanced RNA Sequence Generation and Structural Insights},
author={Zhang, Zaixi and Chao, Linlin and Jin, Ruofan and Zhang, Yikun and Zhou, Guowei and Yang, Yujie and Yang, Yukang and Huang, Kaixuan and Yang, Qirong and Xu, Ziyao and Zhang, Xiaoming and Cong, Le and Wang, Mengdi},
journal={bioRxiv},
pages={2024--12},
year={2024},
publisher={Cold Spring Harbor Laboratory}
}π Acknowledgments
We thank the following open-source projects for their valuable contributions:
π License
This project is licensed under the MIT License - see the LICENSE file for details.


