GitHunt
BI

bit-whacker/graphrag-genomics

Local models support for Microsoft's graphrag using ollama (llama3, mistral, gemma2 phi3)- LLM & Embedding extraction

๐Ÿ“Š # GraphRAG-genomics

GraphRAG-Omics is an extension of Microsoft's GraphRAG library and TheAiSingularity/graphrag-local-ollama library that enables users to convert unstructured documents into knowledge graphs and interact with them using natural language queries.

This is just an expremental repository where the prompts were tailored for genomics and clinical documents.


๐Ÿš€ Main Highlights

  • ๐Ÿ“„ Document Indexing: Convert raw .txt documents into .parquet files - this uses graphrag library.
  • ๐Ÿง  Knowledge Graph Generation: Transform indexed documents into a structured knowledge graph stored in a Neo4j server.
  • ๐Ÿ’ฌ Natural Language Querying: Interact with your knowledge graph through an intuitive Streamlit web interface โ€” ask questions, get insights.

๐Ÿ—‚๏ธ Project Structure

graphrag-omics/
โ”‚
โ”œโ”€โ”€ graphrag_workflow.bat      # Command-line script to index documents
โ”œโ”€โ”€ app.py                     # Streamlit app for graph creation and querying
โ”œโ”€โ”€ input/                     # Directory to place raw .txt documents
โ””โ”€โ”€ proj_<project_name>/       # Generated output for each project

Components

  1. Command-Line Indexing Script

    • Takes input .txt documents
    • Outputs .parquet files into a project-specific folder
  2. Streamlit Web App

    • Indexing Tab: Load .parquet files and generate a knowledge graph in Neo4j
    • Query Tab: Use natural language to query your knowledge graph (GraphRAG interface)

๐Ÿงช How to Run

Prerequisits

  1. install all necessary required libraries
  2. install neo4j-desktop
  3. install the graphrag, by executing the following command inside the root directory of the project.
pip install -e .

Step 1: Index Your Documents

  1. Place your .txt documents inside the input/ folder (located in the root of the project).
  2. Run the following command:
bash graphrag_workflow.bat proj_<project_name>

๐Ÿ”’ The project name must start with proj_
โœ… Example: For a project named "med", use:

bash graphrag_workflow.bat proj_med

This will create a folder proj_med/ and generate the .parquet files inside it.


Step 2: Generate the Knowledge Graph & Query

  1. Start the Streamlit app:
streamlit run app.py
  1. Navigate to your browser where the app opens automatically.
  2. Use the following tabs inside the app:
    • Indexing: Select a project (e.g., proj_med) and generate the knowledge graph in Neo4j.
    • Query: Ask questions using natural language โ€” powered by the generated knowledge graph (GraphRAG style).

๐Ÿ“Œ Notes

  • Only .txt documents are currently supported.
  • Ensure that the Neo4j server is running before using the Indexing or Query functionality in the app.

๐Ÿงฌ Use Cases

  • Genomics research papers
  • Clinical documents & patient summaries
  • Biomedical literature mining
  • Interactive Q&A from specialized unstructured data

Languages

Python96.7%Jupyter Notebook2.1%Nunjucks0.6%Shell0.3%Jinja0.2%CSS0.1%JavaScript0.1%

Contributors

MIT License
Created April 21, 2025
Updated April 21, 2025
bit-whacker/graphrag-genomics | GitHunt