ScribeCLI

A powerful Node.js CLI tool that turns any blog into a clean Markdown content library. It scrapes posts from a list URL, downloads all images locally, and formats frontmatter exactly how you need it.

Features

Custom Frontmatter: Define your frontmatter structure using a demo.md template.
Image Downloading: Automatically downloads hero and content images to a local folder and updates links.
Smart Cleanup: Removes duplicate titles, metadata blocks (author/date), and site-specific footer content ("Recent Posts", "Follow us").
Infinite Scroll: robustly scrolls list pages to capture all posts.
Configurable Selectors: Use interactive prompts or a JSON config file for CSS selectors.

Installation

npm install

Usage

Run the tool with the blog URL and your template path:

node index.js <blog_list_url> <path_to_demo.md> [options]

Options

--config, -c: Path to a JSON configuration file containing CSS selectors (bypasses interactive prompts).

Example

node index.js https://www.bonnpark.com/blog demo.md --config bonnpark_config.json

Configuration

`demo.md`

Create a sample markdown file with the frontmatter fields you want to extract. The tool will parse these keys and scrape them.

---
title: ""
date: ""
image: ""
categories: []
tags: []
author: ""
---

`config.json` (Optional)

Navigate the interactive prompts once, or create a JSON file with your selectors:

{
  "postLinkSelector": "a[href*='/post/']",
  "fm_title": "h1",
  "fm_date": "span.date",
  "fm_image": "img.hero",
  "contentSelector": "article"
}

tfistiak/scribe-cli

ScribeCLI

Features

Installation

Usage

Options

Example

Configuration

`demo.md`

`config.json` (Optional)

On this page

Languages

Contributors

tfistiak/scribe-cli

ScribeCLI

Features

Installation

Usage

Options

Example

Configuration

demo.md

config.json (Optional)

On this page

Languages

Contributors

`demo.md`

`config.json` (Optional)