176 results for “topic:content-extraction”
🔥 Official Firecrawl MCP Server - Adds powerful web scraping and search to Cursor, Claude and any other LLM clients.
Open-source, production-grade web scraping engine built for LLMs. Scrape and crawl the entire web, clean markdown, ready for your agents.
Model Context Protocol (MCP) Server for Graphlit Platform
A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package
A powerful MCP server extension providing web search and content extraction capabilities. Integrates DuckDuckGo search functionality and URL content extraction into your MCP environment, enabling AI assistants to search the web and extract webpage content programmatically.
Readability2 converts HTML to plain text.
Next.js template for seamless PDF parsing using pdf2json and FilePond. Ideal for developers seeking a ready-to-use solution for PDF content extraction in Next.js projects.
Pure ruby implementation of the Boilerpipe content extraction algorithm tuned for online articles
DOM Based Content Extraction via Text Density
Web content extraction using machine learning
A collection of OpenClaw Agent Skills — search, analysis, content extraction, and more.
🔍 Model Context Protocol (MCP) tool for parsing websites using the Jina.ai Reader
Local browser toolkit for AI agents: deep research and browser use automation with local Chrome (CDP) + Playwright. Flexible, extensible scripts for web navigation, extraction and workflow automatization - built for reproducible research and agent-driven browsing.
Tool to extracts the text from a web article urls and get frequency words, entities recognition, automatic summary and more
Make PDF Files Accessible, Extract Data from PDF, Convert PDF to HTML, Fill-in PDF Form, Stamp PDF and more...
Benson turns a list of URLs into mp3s of the contents of each web page - take control over your reading backlog!
A userscript that adds a button to YouTube video pages for copying the transcript with or without timestamps.
This repository houses a Python application for extracting YouTube video transcripts and summarizing its content.
Extract meaningful content from the chaos of a web page
Live Web Access for Your Local AI — Tunable Search & Clean Content Extraction
Via Text Density Simple Web Crawler With Go
Pure Rust document-to-Markdown converter for LLM workflows (DOCX, PPTX, XLSX, HTML, CSV, JSON, XML, images).
Seize is light Node or Browser web-page content extractor inspired by arc90 readability and Safari Reader
📸 Crawell – 网页图片/正文一键提取、Markdown 转换与批量下载的浏览器扩展,本地化,免费 Crawell browser extension for one-click image & article extraction, Markdown conversion and bulk download – 100 % local processing.
Chrome extension to copy YouTube transcripts with AI-friendly features
The Ultimate Web Content Extraction & Conversion Tool for AI/LLM Applications. Convert almost any web content into clean Markdown with intelligent AI processing.
Mobile First Indexing Tool
Convert webpages to clean Markdown for LLM and RAG workflows. Browser-based UI + Node.js CLI with selector drilling, metadata extraction, and batch processing.
Web content extraction engine backed by Qt WebEngine.
This Python-based repository hosts a sophisticated service designed for scraping web articles and converting them into Markdown format. The core functionality of this service includes extracting the main content of articles, such as headlines, key paragraphs, and associated images, and then seamlessly transforming this content into well-structured…