GitHunt
PR

Prathameshsci369/-Bharat-Regional-Language-Fact-News-Detection-System

Bharat AI is a next-generation misinformation detection framework combining local LLMs and transparent reasoning. It offers multilingual verification, context awareness, and a user-friendly interface for real-time fact-checking across India.

๐Ÿ‡ฎ๐Ÿ‡ณ Bharat: Regional Language Fact News Detection System

Streamlit
Python
License
Status


๐Ÿ“˜ Overview

Bharat is a multilingual, agentic AI system built with Streamlit, designed to detect and verify misinformation across 22 Indian languages.
It continuously monitors platforms like Reddit, YouTube, and regional news portals, detects claims, verifies them against trusted sources, and presents transparent, evidence-backed results.

The platform integrates Local Large Language Models (LLMs) (e.g., Phi-4 via llama-cpp-python) and Gemini-based query generation, offering both accuracy and explainability.


โš™๏ธ How It Works (Pipeline Overview)

The analysis pipeline executes in five main stages:

1๏ธโƒฃ User Query Input

  • The user enters a topic or claim in the UI.
  • This query is passed directly to the reddit.py module, which initiates a targeted search on Reddit.

2๏ธโƒฃ Data Scraping (PRAW)

  • The Reddit API (via PRAW) is used to scrape relevant posts and comments.
  • The collected data is stored in a JSON file named reddit_search_output.json.

3๏ธโƒฃ Data Preparation (LangChain)

  • The scraped text is split into smaller, meaningful chunks using RecursiveCharacterTextSplitter.
  • These chunks are grouped into batches for efficient parallel analysis.

4๏ธโƒฃ Claim Verification (Local LLM - Phi-4)

  • Each batch is analyzed by the local LLM (Phi-4) via llama-cpp-python.

  • The model performs reasoning and zero-shot classification, identifying and labeling each claim as:

    • โœ… True
    • โŒ False
    • โš ๏ธ Misleading
    • โ“ Unverifiable

5๏ธโƒฃ Multilingual Explanation & Visualization (Streamlit)

  • The results are displayed in the Streamlit app (final5.py) with:

    • Interactive visual charts using Altair
    • Color-coded claim cards
    • Confidence scores, sources, and explanations in readable format

๐Ÿš€ Setup & Installation

1. Prerequisites

Before running, ensure you have:

  • ๐Ÿ Python 3.9+
  • ๐Ÿ’พ A quantized LLM model file (e.g., phi4.gguf)
  • ๐Ÿ”‘ Reddit API credentials (for live scraping)

2. Install Dependencies

Install all necessary Python packages:

pip install -r requirements.txt

3. Configure Reddit API Credentials

Before running live analysis, authenticate the Reddit scraper.

๐Ÿ”น Step 1: Create a Reddit App

  1. Go to Reddit App Preferences.

  2. Click "Create App" โ†’ select "script" type.

  3. Fill in:

    • Name: Bharat-AI
    • Redirect URI: http://localhost:8080
  4. Save to obtain your Client ID and Client Secret.

๐Ÿ”น Step 2: Apply Credentials in reddit.py

import os
import praw

REDDIT_CLIENT_ID = os.getenv("REDDIT_CLIENT_ID", "your_client_id")
REDDIT_CLIENT_SECRET = os.getenv("REDDIT_CLIENT_SECRET", "your_client_secret")
REDDIT_USER_AGENT = os.getenv("REDDIT_USER_AGENT", "Bharat-FactCheck-App by /u/your_username")

reddit = praw.Reddit(
    client_id=REDDIT_CLIENT_ID,
    client_secret=REDDIT_CLIENT_SECRET,
    user_agent=REDDIT_USER_AGENT
)

๐Ÿ’ก Tip: Export credentials securely from terminal:

export REDDIT_CLIENT_ID="your_client_id"
export REDDIT_CLIENT_SECRET="your_client_secret"
export REDDIT_USER_AGENT="Bharat-FactCheck-App by /u/your_username"

4. Run the Application

Once dependencies and credentials are ready, launch the app:

streamlit run final5.py

Then open the app in your browser:

http://localhost:8501

Youโ€™ll see the Bharat Dashboard, where you can:

  • Enter any claim or topic (e.g., โ€œHarshad Mehta Scam 1992โ€)

  • Choose between:

    • Live Analysis (Full Pipeline) โ€” Runs full Reddit + LLM workflow
    • Test Mode (Mock Data) โ€” Runs demo with built-in examples
  • Explore visual summaries, classification metrics, and detailed explanations.


๐Ÿง  In-App Information Sections

The updated UI includes:

  • Title: ๐Ÿ‡ฎ๐Ÿ‡ณ Bharat: Regional Language Fact News Detection System
  • Subheader: Highlights multilingual, evidence-based approach.
  • Expander Section: Describes the purpose and methodology.
  • Sidebar Tagline: Short project summary for context.
  • Footer: Built for the Agentic AI - Misinformation Track | Team Bharat

๐Ÿ–ฅ๏ธ Example Output

  • Interactive classification charts
  • Summarized claim statistics
  • Transparent, citation-backed reasoning
โœ… TRUE CLAIM
Claim: "Harshad Mehta was trapped by bureaucrats and journalists."
Reason: Supported by multiple Reddit posts verifying this narrative.
Source URL: https://www.reddit.com/r/indianews/comments/def456/

๐Ÿ“ Project Structure

File Description
final5.py ๐ŸŽจ Streamlit App โ€” updated UI and orchestration logic.
final.py ๐Ÿง  Analysis Core โ€” chunking, batching, and LLM-based reasoning.
reddit.py ๐Ÿ”Ž Reddit Scraper โ€” PRAW integration and search management.
requirements.txt ๐Ÿ“ฆ Project dependencies.
reddit_search_output.json ๐Ÿ’พ Raw scraped Reddit data.
README.md ๐Ÿ“˜ Documentation file.

๐Ÿค Contributing

We welcome contributions!
Improve multilingual support, optimize prompts, or enhance visualization โ€” just submit a PR.


๐Ÿ“œ License

Licensed under the MIT License โ€” see LICENSE for details.


๐Ÿ’ก Summary

Bharat AI is a next-generation misinformation detection framework combining local LLMs and transparent reasoning.
It offers multilingual verification, context awareness, and a user-friendly interface for real-time fact-checking across India.

Developed with โค๏ธ by Team Bharat | Agentic AI - Misinformation Track