GitHunt — Discover GitHub Repositories

The undetected self-hosted browser automation platform. Powered by Camoufox (Firefox) for 0% detection rates. Built for speed, privacy, and scalability.

TypeScript1.7k227Updated 8 hours ago

automationautomation-apiautomation-platformbrowser-automationbrowser-testing+14

JonathanLink/PDFLayoutTextStripper

Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (from the Apache PDFBox library).

Java1.6k214Updated 5 days ago

data-extractionextractjavalayoutpdf+2

hi-primus/optimus

:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

Python1.5k232Updated 1 day ago

big-data-cleaningbigdatacudfdaskdask-cudf+14

raznem/parsera

Lightweight library for scraping web-sites with LLMs

Python1.3k74Updated 1 week ago

aiai-scrapingdata-extractionllmopensource+4

thinh-vu/vnstock

A beginner-friendly yet powerful Python toolkit for financial analysis and automation — built to make modern investing accessible to everyone

Python1.2k243Updated 1 day ago

data-extractionquantitative-analysisquantitative-financequantitative-tradingstock-market+1

eclaire-labs/eclaire

Local-first, open-source AI assistant for your data. Unify tasks, notes, docs, photos, and bookmarks. Private, self-hosted, and extensible via APIs.

TypeScript81981Updated 1 day ago

aiai-assistantautomationbookmark-managerbookmarks+14

polyrabbit/hacker-news-digest

:newspaper: Let ChatGPT Summarize Hacker News for You

Python74496Updated 1 week ago

chatgptchatgpt-apicrawlerdata-extractionextract-summaries+10

adrienjoly/npm-pdfreader

🚜 Parse text and tables from PDF files.

HTML69988Updated 1 week ago

data-extractionjavascriptparse-tablesparsingpdf-converter+3

ScrapeGraphAI/scrapecraft

🤖 AI-powered web scraping editor with visual workflow builder. Build, test & deploy web scrapers using natural language. Powered by ScrapeGraphAI & LangGraph.

Python61097Updated 3 days ago

aiautomationdata-extractiondockerfastapi+8

A-

a-maliarov/amazoncaptcha

Pure Python, lightweight, Pillow-based solver for Amazon's text captcha.

Python48791Updated 1 week ago

amazonamazon-captchaamazon-scraperamazoncaptchacaptcha+5

vakra-dev/reader

Open-source, production-grade web scraping engine built for LLMs. Scrape and crawl the entire web, clean markdown, ready for your agents.

TypeScript46632Updated 2 days ago

aiai-agentsai-crawlerai-scraperanti-bot+15

jpjacobpadilla/Stealth-Requests

Undetected web-scraping & seamless HTML parsing in Python!

Python44945Updated 1 week ago

datadata-extractionhtml-parsinghttp-clienthttp-requests+8

yfedoseev/pdf_oxide

The fastest PDF library for Python and Rust. Text extraction, image extraction, markdown conversion, PDF creation & editing. 0.8ms mean, 5× faster than industry leaders, 100% pass rate on 3,830 PDFs. MIT/Apache-2.0.

Rust40038Updated just now

data-extractiondocument-processingfastimage-extractionllm+13

py-pdf/benchmarks

Benchmarking PDF libraries

Python32721Updated 1 day ago

benchmarkdata-extractionmupdfpdfpoppler-utils+2

BrowserCash/teracrawl

High-performance web crawler API optimized for LLMs. Turn any search or website into clean Markdown using remote browsers. Firecrawl alternative

TypeScript23426Updated 3 days ago

ai-agentsai-crawlerai-scrapingai-searchantibot-bypass+13

serpapi/clauneck

A tool for scraping emails, social media accounts, and much more information from websites using Google Search Results.