ScribeCLI
A powerful Node.js CLI tool that turns any blog into a clean Markdown content library. It scrapes posts from a list URL, downloads all images locally, and formats frontmatter exactly how you need it.
Features
- Custom Frontmatter: Define your frontmatter structure using a
demo.mdtemplate. - Image Downloading: Automatically downloads hero and content images to a local folder and updates links.
- Smart Cleanup: Removes duplicate titles, metadata blocks (author/date), and site-specific footer content ("Recent Posts", "Follow us").
- Infinite Scroll: robustly scrolls list pages to capture all posts.
- Configurable Selectors: Use interactive prompts or a JSON config file for CSS selectors.
Installation
npm installUsage
Run the tool with the blog URL and your template path:
node index.js <blog_list_url> <path_to_demo.md> [options]Options
--config,-c: Path to a JSON configuration file containing CSS selectors (bypasses interactive prompts).
Example
node index.js https://www.bonnpark.com/blog demo.md --config bonnpark_config.jsonConfiguration
demo.md
Create a sample markdown file with the frontmatter fields you want to extract. The tool will parse these keys and scrape them.
---
title: ""
date: ""
image: ""
categories: []
tags: []
author: ""
---config.json (Optional)
Navigate the interactive prompts once, or create a JSON file with your selectors:
{
"postLinkSelector": "a[href*='/post/']",
"fm_title": "h1",
"fm_date": "span.date",
"fm_image": "img.hero",
"contentSelector": "article"
}