All RAG Techniques: A Simpler, Hands-On Approach โจ
This repository takes a clear, hands-on approach to Retrieval-Augmented Generation (RAG), breaking down advanced techniques into straightforward, understandable implementations. Instead of relying on frameworks like LangChain or FAISS, everything here is built using familiar Python libraries openai, numpy, matplotlib, and a few others.
The goal is simple: provide code that is readable, modifiable, and educational. By focusing on the fundamentals, this project helps demystify RAG and makes it easier to understand how it really works.
๐ What's Inside?
This repository contains a collection of Jupyter Notebooks, each focusing on a specific RAG technique. Each notebook provides:
- A concise explanation of the technique.
- A step-by-step implementation from scratch.
- Clear code examples with inline comments.
- Evaluations and comparisons to demonstrate the technique's effectiveness.
- Visualization to visualize the results.
Here's a glimpse of the techniques covered:
| Notebook | Description |
|---|---|
| 1. Simple RAG | A basic RAG implementation. A great starting point! |
| 2. Semantic Chunking | Splits text based on semantic similarity for more meaningful chunks. |
| 3. Chunk Size Selector | Explores the impact of different chunk sizes on retrieval performance. |
| 4. Context Enriched RAG | Retrieves neighboring chunks to provide more context. |
| 5. Contextual Chunk Headers | Prepends descriptive headers to each chunk before embedding. |
| 6. Document Augmentation RAG | Generates questions from text chunks to augment the retrieval process. |
| 7. Query Transform | Rewrites, expands, or decomposes queries to improve retrieval. Includes Step-back Prompting and Sub-query Decomposition. |
| 8. Reranker | Re-ranks initially retrieved results using an LLM for better relevance. |
| 9. RSE | Relevant Segment Extraction: Identifies and reconstructs continuous segments of text, preserving context. |
| 10. Contextual Compression | Implements contextual compression to filter and compress retrieved chunks, maximizing relevant information. |
| 11. Feedback Loop RAG | Incorporates user feedback to learn and improve RAG system over time. |
| 12. Adaptive RAG | Dynamically selects the best retrieval strategy based on query type. |
| 13. Self RAG | Implements Self-RAG, dynamically decides when and how to retrieve, evaluates relevance, and assesses support and utility. |
| 14. Proposition Chunking | Breaks down documents into atomic, factual statements for precise retrieval. |
| 15. Multimodel RAG | Combines text and images for retrieval, generating captions for images using LLaVA. |
| 16. Fusion RAG | Combines vector search with keyword-based (BM25) retrieval for improved results. |
| 17. Graph RAG | Organizes knowledge as a graph, enabling traversal of related concepts. |
| 18. Hierarchy RAG | Builds hierarchical indices (summaries + detailed chunks) for efficient retrieval. |
| 19. HyDE RAG | Uses Hypothetical Document Embeddings to improve semantic matching. |
| 20. CRAG | Corrective RAG: Dynamically evaluates retrieval quality and uses web search as a fallback. |
๐๏ธ Repository Structure
fareedkhan-dev-all-rag-techniques/
โโโ README.md <- You are here!
โโโ 1_simple_rag.ipynb
โโโ 2_semantic_chunking.ipynb
โโโ 3_chunk_size_selector.ipynb
โโโ 4_context_enriched_rag.ipynb
โโโ 5_contextual_chunk_headers_rag.ipynb
โโโ 6_doc_augmentation_rag.ipynb
โโโ 7_query_transform.ipynb
โโโ 8_reranker.ipynb
โโโ 9_rse.ipynb
โโโ 10_contextual_compression.ipynb
โโโ 11_feedback_loop_rag.ipynb
โโโ 12_adaptive_rag.ipynb
โโโ 13_self_rag.ipynb
โโโ 14_proposition_chunking.ipynb
โโโ 15_multimodel_rag.ipynb
โโโ 16_fusion_rag.ipynb
โโโ 17_graph_rag.ipynb
โโโ 18_hierarchy_rag.ipynb
โโโ 19_HyDE_rag.ipynb
โโโ 20_crag.ipynb
โโโ requirements.txt <- Python dependencies
โโโ data/
โโโ val.json <- Sample validation data (queries and answers)
โโโ AI_information.pdf <- A sample PDF document for testing.
โโโ attention_is_all_you_need.pdf <- A sample PDF document for testing (for Multi-Modal RAG).
๐ ๏ธ Getting Started
-
Clone the repository:
git clone https://github.com/FareedKhan-dev/all-rag-techniques.git cd all-rag-techniques -
Install dependencies:
pip install -r requirements.txt
-
Set up your OpenAI API key:
-
Obtain an API key from Nebius AI.
-
Set the API key as an environment variable:
export OPENAI_API_KEY='YOUR_NEBIUS_AI_API_KEY'
or
setx OPENAI_API_KEY "YOUR_NEBIUS_AI_API_KEY" # On Windows
or, within your Python script/notebook:
import os os.environ["OPENAI_API_KEY"] = "YOUR_NEBIUS_AI_API_KEY"
-
-
Run the notebooks:
Open any of the Jupyter Notebooks (
.ipynbfiles) using Jupyter Notebook or JupyterLab. Each notebook is self-contained and can be run independently. The notebooks are designed to be executed sequentially within each file.Note: The
data/AI_information.pdffile provides a sample document for testing. You can replace it with your own PDF. Thedata/val.jsonfile contains sample queries and ideal answers for evaluation.
The 'attention_is_all_you_need.pdf' is for testing Multi-Modal RAG Notebook.
๐ก Core Concepts
-
Embeddings: Numerical representations of text that capture semantic meaning. We use Nebius AI's embedding API and, in many notebooks, also the
BAAI/bge-en-iclembedding model. -
Vector Store: A simple database to store and search embeddings. We create our own
SimpleVectorStoreclass using NumPy for efficient similarity calculations. -
Cosine Similarity: A measure of similarity between two vectors. Higher values indicate greater similarity.
-
Chunking: Dividing text into smaller, manageable pieces. We explore various chunking strategies.
-
Retrieval: The process of finding the most relevant text chunks for a given query.
-
Generation: Using a Large Language Model (LLM) to create a response based on the retrieved context and the user's query. We use the
meta-llama/Llama-3.2-3B-Instructmodel via Nebius AI's API. -
Evaluation: Assessing the quality of the RAG system's responses, often by comparing them to a reference answer or using an LLM to score relevance.
๐ค Contributing
Contributions are welcome!