GitHunt
AS

Ashis-Mishra07/WebScrapper

This is a web scrapper from reddit and chrome .

🧠 AI-Powered WebScraping Assistant

An advanced, real-time AI assistant powered by Streamlit, FastAPI, LangGraph, and Groq, capable of web scraping from Reddit and the open web, bypassing anti-bot mechanisms using Bright Data MCP. It supports natural voice responses using TTS, and integrates Ollama for powerful local LLM inference.


πŸš€ Features

  • πŸ” Web Scraping Engine
    Scrapes Reddit and browser-based content using automated headless browsing and anti-bot evasion (BrightData MCP).

  • 🧠 Groq LLM Integration
    Blazing-fast, accurate responses using Groq-powered language models for natural dialogue.

  • πŸ—ΊοΈ LangGraph-Based Flow
    Modular, multi-step reasoning pipeline using LangGraph for structured agent behavior.

  • 🧩 Local Model Support (Ollama)
    Plug-and-play support for running local LLMs via Ollama.

  • πŸ—£οΈ Text-to-Speech (TTS)
    Converts AI responses into human-like speech for a natural conversation experience.

  • πŸ–₯️ Streamlit UI + FastAPI Backend
    Beautiful and fast front-end for real-time interaction, powered by FastAPI REST services.


πŸ› οΈ Tech Stack

Technology Purpose
Streamlit Frontend interface
FastAPI Backend API server
BrightData MCP Web scraping and anti-bot bypassing
Groq LLM for fast and intelligent responses
LangGraph Multi-agent reasoning and logic graph
Ollama Local LLM support (e.g. LLaMA2, Mistral)
TTS Voice output from textual response

πŸ“Έ Screenshots

Coming Soon – Add your screenshots or demo GIFs here.


πŸ§ͺ Installation

# Clone the repo
git clone https://github.com/yourusername/ai-webscraping-assistant.git
cd ai-webscraping-assistant

# Install dependencies
pip install -r requirements.txt

# Set up environment variables
cp .env.example .env
# Fill in required keys like GROQ_API_KEY, BRIGHTDATA credentials, etc.

# Run backend
python backend.py

# Run Streamlit app
streamlit run app.py