KnowInfo - Crisis Misinformation Detection System
Made In Mumbai Hacks 2025 by Codebreakers
A real-time AI-powered system for detecting, verifying, and correcting misinformation during global crises.
๐ฏ Features
- Multi-Source Monitoring: Ingests content from Twitter/X, Reddit, Telegram, and RSS feeds
- AI-Powered Claim Extraction: Uses LLMs (Ollama, Gemini, OpenAI) to extract verifiable claims
- RAG-Based Verification: Cross-references claims against authoritative sources (WHO, CDC, Reuters, etc.)
- Patient Zero Tracking: Graph-based analysis to trace misinformation origins and spread
- Dual-Mode Telegram Integration:
- Ingestion: Monitors public channels for crisis intel
- Response: Interactive bot for user verification
- WhatsApp Bot: Instant fact-checking via WhatsApp
- Real-Time Dashboard: Live monitoring of trending false claims
- Continuous Learning: A/B testing and feedback integration
๐๏ธ Architecture
6-Stage Pipeline
Stage 1: Ingestion โ Stage 2: Extraction โ Stage 3: Verification
โ
Stage 6: Learning โ Stage 5: Response โ Stage 4: Tracking
- Ingestion & Monitoring: Stream content from social media platforms
- Claim Extraction: NLP-based extraction and categorization (P0-P3 priority)
- Verification: RAG engine with vector database for fact-checking
- Patient Zero Tracking: Neo4j graph for propagation analysis
- Response Generation: WhatsApp bot, Dashboard API, Deep-dive reports
- Continuous Learning: Feedback loops and adversarial training
๐ Quick Start
Prerequisites
- Docker & Docker Compose
- (Optional) NVIDIA GPU for local models
- API Keys (optional): Gemini, OpenAI, or Anthropic
Installation
- Clone the repository
git clone <repository-url>
cd KnowInfo- Set up environment variables
cp .env.example .env
# Edit .env with your API keys and configuration- Start services with Docker Compose
docker-compose up -dThis will start:
- MongoDB (port 27017)
- Neo4j (ports 7474, 7687)
- Redis (port 6379)
- Ollama (port 11434)
- KnowInfo API (port 8000)
- Setup Ollama models (if using local models)
chmod +x scripts/setup_ollama.sh
./scripts/setup_ollama.sh- Seed the knowledge base
docker-compose exec api python scripts/seed_knowledge_base.py- Access the system
- API: http://localhost:8000
- API Docs: http://localhost:8000/docs
- Neo4j Browser: http://localhost:7474
- Health Check: http://localhost:8000/health
๐ง Configuration
Model Selection
KnowInfo supports multiple LLM providers. Configure in .env:
# Use local models first (free, private)
USE_LOCAL_MODELS_FIRST=true
# Ollama (local, free)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL_EMBEDDING=nomic-embed-text
OLLAMA_MODEL_EXTRACTION=llama3.2
# Gemini (recommended for API)
GEMINI_API_KEY=your_key_here
# OpenAI (alternative)
OPENAI_API_KEY=your_key_hereSocial Media APIs
Configure social media monitoring:
# Twitter/X
TWITTER_API_KEY=your_key
TWITTER_API_SECRET=your_secret
TWITTER_BEARER_TOKEN=your_token
# Reddit
REDDIT_CLIENT_ID=your_id
REDDIT_CLIENT_SECRET=your_secret
# Telegram
TELEGRAM_BOT_TOKEN=your_token
# WhatsApp (Twilio)
TWILIO_ACCOUNT_SID=your_sid
TWILIO_AUTH_TOKEN=your_token๐ Usage Examples
Verifying a Claim (Python)
import asyncio
from src.stage2_extraction.claim_extractor import ClaimExtractor
from src.stage3_verification.rag_engine import RAGEngine
from src.models.content import Content, SourcePlatform
from datetime import datetime
async def verify_claim_example():
# Create content
content = Content(
content_id="test123",
source=SourcePlatform.TWITTER,
platform_id="12345",
text="Breaking: WHO announces new miracle cure for COVID-19",
author_id="user123",
author_username="newsbot",
created_at=datetime.utcnow()
)
# Extract claims
extractor = ClaimExtractor()
claims = await extractor.extract_claims(content)
# Verify first claim
if claims:
rag_engine = RAGEngine(
knowledge_base_path="./data/knowledge_base",
vector_db_path="./data/vector_db"
)
verification = await rag_engine.verify_claim(claims[0])
print(f"Status: {verification.status}")
print(f"Confidence: {verification.confidence_score}%")
print(f"Explanation: {verification.explanation}")
asyncio.run(verify_claim_example())API Usage
# Check system health
curl http://localhost:8000/health
# Get metrics
curl http://localhost:8000/metrics
# Verify claim via API (when implemented)
curl -X POST http://localhost:8000/api/v1/verify \
-H "Content-Type: application/json" \
-d '{"claim_text": "WHO says vaccines cause autism"}'๐๏ธ Database Schema
MongoDB Collections
- contents: Raw social media posts
- claims: Extracted verifiable claims
- verifications: Verification results
Neo4j Graph Model
(User)-[:POSTED]->(Post)-[:CONTAINS]->(Claim)
(Post)-[:SHARED_FROM]->(OriginalPost)
Redis Keys
verification:{hash}: Cached verification resultsvelocity:{claim_hash}: Trending claim countersrate_limit:{user_id}: Rate limitingqueue:{queue_name}: Background task queues
๐ Monitoring & Metrics
Prometheus Metrics
content_ingested_total: Total content items by sourceclaims_extracted_total: Claims by category/priorityverifications_completed_total: Verifications by statusverification_duration_seconds: Verification latencywhatsapp_queries_total: WhatsApp bot usage
Access metrics at: http://localhost:8000/metrics
๐ก๏ธ Safety Guardrails
Built-in Protections
- Confidence Thresholds: Won't declare claims false with <80% confidence
- Precautionary Principle: Flags potentially harmful claims even without full verification
- Privacy Compliance: Automatically redacts PII
- Bias Mitigation: Audits source diversity and political balance
- Expert Review: Flags low-confidence and P0 claims for human review
Priority Levels
- P0: Imminent physical harm (evacuation orders, poisoned supplies)
- P1: Medical misinformation during health crises
- P2: False attribution to authorities
- P3: Other verifiable false claims
๐ Extending the System
Adding New Sources
Create a new monitor in src/stage1_ingestion/:
from .base_monitor import BaseMonitor
from ..models.content import Content, SourcePlatform
class MyPlatformMonitor(BaseMonitor):
async def stream_content(self):
# Implement platform-specific streaming
while self.is_running:
# Fetch content
yield Content(...)Adding Knowledge Sources
from src.stage3_verification.rag_engine import RAGEngine
rag = RAGEngine(...)
await rag.add_source_to_knowledge_base(
title="New Authoritative Source",
content="Content text...",
url="https://example.com",
source_type="government",
credibility="high"
)๐งช Testing
# Run tests
pytest tests/
# With coverage
pytest --cov=src tests/
# Specific stage
pytest tests/test_stage3/๐ Project Structure
KnowInfo/
โโโ config.py # Configuration management
โโโ main.py # FastAPI application
โโโ requirements.txt # Python dependencies
โโโ docker-compose.yml # Docker services
โ
โโโ src/
โ โโโ database/ # Database managers
โ โโโ models/ # Pydantic data models
โ โโโ utils/ # Utilities (logging, metrics, etc.)
โ โโโ stage1_ingestion/ # Content monitoring
โ โโโ stage2_extraction/ # Claim extraction
โ โโโ stage3_verification/ # RAG verification
โ โโโ stage4_tracking/ # Patient zero tracking
โ โโโ stage5_response/ # WhatsApp bot, dashboard
โ โโโ stage6_learning/ # Continuous learning
โ
โโโ scripts/ # Setup and utility scripts
โโโ tests/ # Test suites
โโโ data/ # Data storage
โโโ vector_db/ # ChromaDB vector database
โโโ knowledge_base/ # Source documents
๐ Security Considerations
- Store API keys in
.env(never commit) - Use environment-specific configurations
- Implement rate limiting (included)
- Sanitize user inputs
- Audit source selection for bias
- Regular security updates
๐ Roadmap
Phase 1: Core Infrastructure โ
- Database setup
- Model manager with Gemini/Ollama support
- Basic claim extraction
- RAG verification engine
Phase 2: Full Pipeline (In Progress)
- Telegram Monitor & Bot Integration
- Twitter/Reddit monitors
- Complete WhatsApp bot
- Dashboard API
- Patient zero tracking
Phase 3: Advanced Features
- Image/video analysis (deepfake detection)
- Multi-language support
- Mobile apps
- Browser extension
๐ค Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new features
- Submit a pull request
๐ License
[Add your license here]
๐ Acknowledgments
- Authoritative sources: WHO, CDC, Reuters, AP
- Open-source tools: FastAPI, MongoDB, Neo4j, ChromaDB, Ollama
- LLM providers: Google (Gemini), Meta (LLaMA), OpenAI
๐ง Contact
[Add contact information]
Note: This system is designed for authorized crisis response use. Ensure compliance with platform terms of service and local regulations when deploying