GitHunt
AS

asifnoushadsharafudeen/ai_rag_wikipedia

Offline RAG pipeline: Wikipedia β†’ Vector DB β†’ LLM QA using LangChain and FAISS

🧠 RAG Wikipedia QA (Offline)

A fully offline Retrieval-Augmented Generation (RAG) system that lets you query any Wikipedia topic using a local LLM and vector store β€” with no internet or API keys required.


πŸš€ Features

βœ… Search any Wikipedia topic and save the content locally
βœ… Embed saved documents into a local FAISS vector store
βœ… Ask questions using Retrieval-Augmented Generation (RAG)
βœ… Runs fully offline β€” no API keys, no cloud dependency
βœ… Lightweight model (sshleifer/tiny-gpt2) β€” runs even on CPU
βœ… CLI interface for easy interaction
βœ… LangChain deprecation warnings cleaned


πŸ—‚οΈ Folder Structure

RAG-Wikipedia-QA/
β”‚
β”œβ”€β”€ docs/ # Saved Wikipedia text files
β”œβ”€β”€ embeddings/ # FAISS vector DBs saved here
β”œβ”€β”€ rag_wikipedia.py # Main script
β”œβ”€β”€ wiki.png # Image (Step 1 & 2)
β”œβ”€β”€ QA.png # Image (Step 3)
└── README.md # This file



πŸ”§ How It Works (Step-by-Step)

βœ… Step 1 – Fetch Wikipedia Content

  • User is prompted to enter a topic name (e.g., India, Python programming language)
  • The script fetches the article summary and saves it as a .txt file under /docs

βœ… Step 2 – Embed Text with FAISS

  • Loads the saved .txt file
  • Splits the text into chunks using LangChain’s CharacterTextSplitter
  • Embeds the chunks into vectors using Hugging Face embeddings
  • Stores them in a FAISS vector database (.faiss and .pkl) inside /embeddings

πŸ“Έ Screenshot:
Step 1 & 2


βœ… Step 3 – Ask a Question (RAG)

  • Prompts user to enter the same filename
  • Loads the vector store and retrieves relevant chunks based on the question
  • Feeds context + question into a local GPT2 model
  • Generates and returns an answer offline

πŸ“Έ Screenshot:
Step 3


πŸ“¦ Requirements

Install dependencies using:

pip install -r requirements.txt


πŸ‘€ Author
Asif Noushad Sharafudeen
πŸ”— LinkedIn
πŸ”— GitHub
asifnoushadsharafudeen/ai_rag_wikipedia | GitHunt