GitHunt
DE

deepbiolab/rag-for-proteins

A demonstration of Retrieval-Augmented Generation (RAG) applied to protein analysis

RAG for Proteins

A demonstration of Retrieval-Augmented Generation (RAG) applied to protein analysis, currently focusing on antibody data from SAbDab (Structural Antibody Database).

๐ŸŒŸ Features

Current:

  • RAG-powered chat interface for antibody structure analysis
  • Integration with SAbDab for antibody structural data
  • Local LLM support via Ollama (currently using Qwen 7B)
  • Persistent vector storage for efficient retrieval
  • Clean, modular architecture with separate components for:
    • Data loading and preprocessing
    • Vector storage management
    • LLM interface
    • RAG pipeline coordination
    • Streamlit UI

๐Ÿš€ Quick Start

Prerequisites

  • Python 3.10+
  • Ollama installed and running
  • Required Python packages:
pip install -r requirements.txt

Contributors

MIT License
Created April 6, 2025
Updated April 7, 2025