LeoYeAI/openclaw-ultra-scraping
🕷️ Adaptive web scraping skill for OpenClaw agents — bypasses anti-bot, survives site redesigns. Powered by MyClaw.ai
README
🕷️ OpenClaw Ultra Scraping
Adaptive web scraping skill for OpenClaw agents — bypasses anti-bot, survives site redesigns.
Built by MyClaw.ai — the AI personal assistant platform that gives every user a full server with complete code control. OpenClaw Ultra Scraping is one of many open-source skills that extend your AI agent's capabilities.
中文 | Français | Deutsch | Русский | 日本語 | Italiano | Español
What is this?
An OpenClaw agent skill that gives your AI assistant powerful web scraping superpowers, powered by the Scrapling framework.
Your agent can now:
- 🛡️ Bypass Cloudflare, Turnstile, and other anti-bot systems out of the box
- 🔄 Survive website redesigns with adaptive element tracking
- ⚡ Crawl at scale with concurrent spiders, pause/resume, and proxy rotation
- 🎯 Extract precisely with CSS, XPath, text search, and BeautifulSoup-style selectors
- 🌐 Render JavaScript SPAs with headless browser support
Install
Via ClawHub (recommended)
clawhub install openclaw-ultra-scrapingManual
# Clone into your OpenClaw skills directory
git clone https://github.com/LeoYeAI/openclaw-ultra-scraping.git ~/.openclaw/workspace/skills/openclaw-ultra-scraping
# Run setup (installs Scrapling + browsers)
bash ~/.openclaw/workspace/skills/openclaw-ultra-scraping/scripts/setup.shUsage
Once installed, just ask your OpenClaw agent to scrape anything:
"Scrape the top 10 products from example.com"
"Extract all links from this page"
"Crawl this site and grab all article titles"
"Get the data from this Cloudflare-protected page"
CLI (for agents & scripts)
PYTHON=/opt/scrapling-venv/bin/python3
# Simple fetch
$PYTHON scripts/scrape.py fetch "https://example.com" --css ".content"
# Bypass Cloudflare
$PYTHON scripts/scrape.py fetch "https://protected.com" --stealth --solve-cloudflare
# Full browser for SPAs
$PYTHON scripts/scrape.py fetch "https://spa-app.com" --dynamic --css ".data"
# Multi-page crawl
$PYTHON scripts/scrape.py crawl "https://example.com" --depth 2 --concurrency 10 -o results.json
# Extract links
$PYTHON scripts/scrape.py links "https://example.com" --filter "\.pdf$"
# Output formats: json, jsonl, csv, text, markdown, html
$PYTHON scripts/scrape.py fetch "https://example.com" -f markdown -o page.mdPython API (for custom scripts)
#!/opt/scrapling-venv/bin/python3
from scrapling.fetchers import Fetcher, StealthyFetcher, DynamicFetcher
# Fast HTTP scraping
page = Fetcher.get('https://example.com', impersonate='chrome')
titles = page.css('h1::text').getall()
# Anti-bot bypass
page = StealthyFetcher.fetch('https://cloudflare-site.com', headless=True, solve_cloudflare=True)
# JavaScript rendering
page = DynamicFetcher.fetch('https://react-app.com', headless=True, network_idle=True)
# Adaptive tracking (survives redesigns)
products = page.css('.product', auto_save=True) # First run: save fingerprints
products = page.css('.product', adaptive=True) # Later: find even if HTML changedFetcher Selection
| Scenario | Fetcher | CLI Flag |
|---|---|---|
| Normal sites | Fetcher |
(default) |
| JS-rendered SPAs | DynamicFetcher |
--dynamic |
| Anti-bot protected | StealthyFetcher |
--stealth |
| Cloudflare Turnstile | StealthyFetcher |
--stealth --solve-cloudflare |
Features
- 3 fetcher tiers: HTTP → Dynamic Browser → Stealth Browser
- Adaptive element tracking: Elements survive site redesigns via fingerprinting
- Spider framework: Scrapy-like concurrent crawling with pause/resume
- Anti-bot bypass: Cloudflare Turnstile, CAPTCHAs, TLS fingerprinting
- Proxy rotation: Built-in
ProxyRotatorwith cyclic or custom strategies - Session management: Persistent cookies/state across requests
- Async support: All fetchers have async variants
- Multiple output formats: JSON, JSONL, CSV, text, Markdown, HTML
Requirements
- OpenClaw instance (get one at myclaw.ai)
- Python 3.10+
- ~2GB disk space (for browsers)
Credits
Built on top of the excellent Scrapling framework by Karim Shoair.
License
MIT
MyClaw.ai — Your AI, Your Server, Your Rules.