Prathameshsci369/-Bharat-Regional-Language-Fact-News-Detection-System
Bharat AI is a next-generation misinformation detection framework combining local LLMs and transparent reasoning. It offers multilingual verification, context awareness, and a user-friendly interface for real-time fact-checking across India.
๐ฎ๐ณ Bharat: Regional Language Fact News Detection System
๐ Overview
Bharat is a multilingual, agentic AI system built with Streamlit, designed to detect and verify misinformation across 22 Indian languages.
It continuously monitors platforms like Reddit, YouTube, and regional news portals, detects claims, verifies them against trusted sources, and presents transparent, evidence-backed results.
The platform integrates Local Large Language Models (LLMs) (e.g., Phi-4 via llama-cpp-python) and Gemini-based query generation, offering both accuracy and explainability.
โ๏ธ How It Works (Pipeline Overview)
The analysis pipeline executes in five main stages:
1๏ธโฃ User Query Input
- The user enters a topic or claim in the UI.
- This query is passed directly to the
reddit.pymodule, which initiates a targeted search on Reddit.
2๏ธโฃ Data Scraping (PRAW)
- The Reddit API (via PRAW) is used to scrape relevant posts and comments.
- The collected data is stored in a JSON file named
reddit_search_output.json.
3๏ธโฃ Data Preparation (LangChain)
- The scraped text is split into smaller, meaningful chunks using
RecursiveCharacterTextSplitter. - These chunks are grouped into batches for efficient parallel analysis.
4๏ธโฃ Claim Verification (Local LLM - Phi-4)
-
Each batch is analyzed by the local LLM (Phi-4) via
llama-cpp-python. -
The model performs reasoning and zero-shot classification, identifying and labeling each claim as:
- โ True
- โ False
โ ๏ธ Misleading- โ Unverifiable
5๏ธโฃ Multilingual Explanation & Visualization (Streamlit)
-
The results are displayed in the Streamlit app (
final5.py) with:- Interactive visual charts using Altair
- Color-coded claim cards
- Confidence scores, sources, and explanations in readable format
๐ Setup & Installation
1. Prerequisites
Before running, ensure you have:
- ๐ Python 3.9+
- ๐พ A quantized LLM model file (e.g.,
phi4.gguf) - ๐ Reddit API credentials (for live scraping)
2. Install Dependencies
Install all necessary Python packages:
pip install -r requirements.txt3. Configure Reddit API Credentials
Before running live analysis, authenticate the Reddit scraper.
๐น Step 1: Create a Reddit App
-
Go to Reddit App Preferences.
-
Click "Create App" โ select "script" type.
-
Fill in:
- Name:
Bharat-AI - Redirect URI:
http://localhost:8080
- Name:
-
Save to obtain your Client ID and Client Secret.
๐น Step 2: Apply Credentials in reddit.py
import os
import praw
REDDIT_CLIENT_ID = os.getenv("REDDIT_CLIENT_ID", "your_client_id")
REDDIT_CLIENT_SECRET = os.getenv("REDDIT_CLIENT_SECRET", "your_client_secret")
REDDIT_USER_AGENT = os.getenv("REDDIT_USER_AGENT", "Bharat-FactCheck-App by /u/your_username")
reddit = praw.Reddit(
client_id=REDDIT_CLIENT_ID,
client_secret=REDDIT_CLIENT_SECRET,
user_agent=REDDIT_USER_AGENT
)๐ก Tip: Export credentials securely from terminal:
export REDDIT_CLIENT_ID="your_client_id" export REDDIT_CLIENT_SECRET="your_client_secret" export REDDIT_USER_AGENT="Bharat-FactCheck-App by /u/your_username"
4. Run the Application
Once dependencies and credentials are ready, launch the app:
streamlit run final5.pyThen open the app in your browser:
http://localhost:8501
Youโll see the Bharat Dashboard, where you can:
-
Enter any claim or topic (e.g., โHarshad Mehta Scam 1992โ)
-
Choose between:
- Live Analysis (Full Pipeline) โ Runs full Reddit + LLM workflow
- Test Mode (Mock Data) โ Runs demo with built-in examples
-
Explore visual summaries, classification metrics, and detailed explanations.
๐ง In-App Information Sections
The updated UI includes:
- Title:
๐ฎ๐ณ Bharat: Regional Language Fact News Detection System - Subheader: Highlights multilingual, evidence-based approach.
- Expander Section: Describes the purpose and methodology.
- Sidebar Tagline: Short project summary for context.
- Footer:
Built for the Agentic AI - Misinformation Track | Team Bharat
๐ฅ๏ธ Example Output
- Interactive classification charts
- Summarized claim statistics
- Transparent, citation-backed reasoning
โ
TRUE CLAIM
Claim: "Harshad Mehta was trapped by bureaucrats and journalists."
Reason: Supported by multiple Reddit posts verifying this narrative.
Source URL: https://www.reddit.com/r/indianews/comments/def456/
๐ Project Structure
| File | Description |
|---|---|
final5.py |
๐จ Streamlit App โ updated UI and orchestration logic. |
final.py |
๐ง Analysis Core โ chunking, batching, and LLM-based reasoning. |
reddit.py |
๐ Reddit Scraper โ PRAW integration and search management. |
requirements.txt |
๐ฆ Project dependencies. |
reddit_search_output.json |
๐พ Raw scraped Reddit data. |
README.md |
๐ Documentation file. |
๐ค Contributing
We welcome contributions!
Improve multilingual support, optimize prompts, or enhance visualization โ just submit a PR.
๐ License
Licensed under the MIT License โ see LICENSE for details.
๐ก Summary
Bharat AI is a next-generation misinformation detection framework combining local LLMs and transparent reasoning.
It offers multilingual verification, context awareness, and a user-friendly interface for real-time fact-checking across India.
Developed with โค๏ธ by Team Bharat | Agentic AI - Misinformation Track