PR
prishadhoot/old-minimal-rag-system
POC RAG system using faiss-cpu, sqlite and gemini
MRAS - Minimal RAG Agent System (OLD)
A local, production-style Retrieval-Augmented Generation (RAG) system built with simplicity and determinism in mind.
Overview
MRAS is a proof-of-concept RAG system that demonstrates:
- Document ingestion and chunking
- Local embedding generation
- Vector storage with FAISS
- Semantic retrieval
- LLM-powered reasoning (Gemini)
- Minimal agent loop
- Evaluation harness
Architecture
Document Ingestion
→ Chunking
→ Embedding
→ FAISS Vector Index
→ Retriever
→ LLM (Gemini)
→ Agent Loop
→ FastAPI API Layer
Tech Stack
- Language: Python 3.11
- API: FastAPI + Uvicorn
- Embeddings: sentence-transformers (all-MiniLM-L6-v2)
- Vector DB: FAISS (CPU version)
- LLM: Gemini API
- Storage: SQLite + Local filesystem
Setup
1. Install Dependencies
pip install -r requirements.txt2. Set Gemini API Key (Legacy)
export GEMINI_API_KEY="your-api-key-here"On Windows:
$env:GEMINI_API_KEY="your-api-key-here"3. Run the Server
uvicorn app.main:app --reloadThe API will be available at http://localhost:8000
API Endpoints
Health Check
GET /healthIngest Documents
POST /ingest
{
"folder_path": "path/to/documents"
}Query
POST /query
{
"query": "What is the capital of France?",
"top_k": 5
}Evaluate
POST /evaluateUsage Example
1. Add documents to the data/documents folder
# Copy your .txt, .md, or .pdf files to data/documents/2. Ingest documents
curl -X POST "http://localhost:8000/ingest" \
-H "Content-Type: application/json" \
-d '{"folder_path": "data/documents"}'3. Query the system
curl -X POST "http://localhost:8000/query" \
-H "Content-Type: application/json" \
-d '{"query": "What is RAG?", "top_k": 5}'Project Structure
mras/
├── app/
│ ├── __init__.py
│ ├── main.py # FastAPI application
│ ├── config.py # Configuration settings
│ ├── models.py # Pydantic data models
│ ├── ingestion.py # Document loading
│ ├── chunking.py # Text chunking
│ ├── embedding.py # Embedding generation
│ ├── vector_store.py # FAISS vector storage
│ ├── retriever.py # Semantic retrieval
│ ├── llm.py # Gemini LLM client
│ ├── agent.py # RAG agent loop
│ └── evaluation.py # Evaluation metrics
├── data/
│ ├── documents/ # Source documents
│ ├── faiss.index # FAISS index (generated)
│ ├── metadata.db # SQLite metadata (generated)
│ └── eval.json # Evaluation dataset
├── tests/ # Test files
├── requirements.txt # Python dependencies
└── README.md # This file
Design Principles
- Minimal moving parts - No unnecessary abstractions
- Clear module boundaries - Each module has a single responsibility
- Deterministic execution - No random behavior
- No overengineering - Simple solutions over complex ones
- Local deployment - Everything runs locally
Constraints
- Fully local deployment
- Free tooling only
- Simplicity prioritized over features
- No LangChain or LlamaIndex
- No async complexity
- Deterministic behavior
Evaluation
Create an eval.json file in the data/ directory with the following format:
[
{
"question": "What is machine learning?",
"expected_keywords": ["algorithm", "data", "prediction"]
},
{
"question": "How does neural network work?",
"expected_keywords": ["neuron", "layer", "activation"]
}
]Then call the /evaluate endpoint to get metrics:
- Average keyword match score
- Average response latency
Module Interfaces
Ingestion
- Loads
.txt,.md,.pdffiles - Returns
(document_id, text)tuples
Chunking
- 500 word chunks with 100 word overlap
- Deterministic chunk IDs:
{document_id}_{index}
Embedding
- all-MiniLM-L6-v2 model
- 384 dimensional vectors
- L2 normalized
Vector Store
- FAISS IndexFlatIP
- SQLite metadata storage
- Persistent index
Retriever
- Semantic search only
- Returns top-k chunks
LLM Client
- Gemini API integration
- NOT_FOUND protocol enforcement
Agent
- Deterministic loop with one retry
- Returns answer + source chunk IDs
Skills Demonstrated
- Embedding space reasoning
- Cosine similarity
- Vector search fundamentals
- Modular architecture
- API engineering
- LLM prompt control
- Evaluation mindset
- Deterministic system design
License
Author
Prisha Dhoot
(Built as a proof-of-concept for understanding RAG systems from first principles.)
! This README has been AI-generated