1,854 results for “topic:data-extraction”
🔥 The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data
🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!
Python scraper based on AI
🔥 The open-source no-code platform for web scraping, crawling, search and AI data extraction • Turn websites into structured APIs in minutes 🔥
LLM-Driven Extraction of Unstructured Data — Built for API Deployments & ETL Pipeline Workflows
Extract Keywords from sentence or Replace keywords in sentences.
A powerful Model Context Protocol (MCP) server that provides an all-in-one solution for public web access.
Python package for scraping recipes data
ContextGem: Effortless LLM extraction from documents
The undetected self-hosted browser automation platform. Powered by Camoufox (Firefox) for 0% detection rates. Built for speed, privacy, and scalability.
Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (from the Apache PDFBox library).
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Lightweight library for scraping web-sites with LLMs
A beginner-friendly yet powerful Python toolkit for financial analysis and automation — built to make modern investing accessible to everyone
Local-first, open-source AI assistant for your data. Unify tasks, notes, docs, photos, and bookmarks. Private, self-hosted, and extensible via APIs.
:newspaper: Let ChatGPT Summarize Hacker News for You
🚜 Parse text and tables from PDF files.
🤖 AI-powered web scraping editor with visual workflow builder. Build, test & deploy web scrapers using natural language. Powered by ScrapeGraphAI & LangGraph.
Pure Python, lightweight, Pillow-based solver for Amazon's text captcha.
Open-source, production-grade web scraping engine built for LLMs. Scrape and crawl the entire web, clean markdown, ready for your agents.
Undetected web-scraping & seamless HTML parsing in Python!
The fastest PDF library for Python and Rust. Text extraction, image extraction, markdown conversion, PDF creation & editing. 0.8ms mean, 5× faster than industry leaders, 100% pass rate on 3,830 PDFs. MIT/Apache-2.0.
Benchmarking PDF libraries
High-performance web crawler API optimized for LLMs. Turn any search or website into clean Markdown using remote browsers. Firecrawl alternative
A tool for scraping emails, social media accounts, and much more information from websites using Google Search Results.
Wikipedia information extraction library
A python client for the Sypht API
A Python utility to digitize plots.
This repository provides usage examples for the Python module Newspaper3k.
a PDF library for rust