WordPress XML Processor for Real-Time Collaboration Mentions

This script processes a WordPress XML export file to identify posts that discuss real-time collaboration, multiplayer features, block-level locking, collaborative editing, or similar concepts. It uses a locally running Ollama instance to analyze post content and outputs the findings to a JSON file.

Prerequisites

Node.js (v18.x or later recommended)
A locally running Ollama instance with a suitable model pulled (e.g., ollama pull llama3)
A WordPress XML export file.

Setup

Clone or download this script into a directory.
Place your WordPress XML export file (e.g., your_export.xml) in the root of this directory.
Install the necessary dependencies:
```
npm install
```

Configuration

Before running the script, you might need to adjust the following constants within process_wordpress_export.js:

OLLAMA_MODEL: Specify the Ollama model you want to use (e.g., 'llama3').
OUTPUT_JSON_FILE: The name for the intermediate JSON output.

Usage

To run the script, provide the path to your WordPress XML export file as a command-line argument:

node process_wordpress_export.js /path/to/your/wordpress_export.xml

The script will read the specified XML file, analyze each post using Ollama, and save the posts identified as relevant into the rtc_posts.json file.

Output

A JSON file (default: rtc_posts.json) containing structured data of posts identified as relevant.

Validation and Accuracy

The script relies on the Ollama AI model to determine if a post is relevant to real-time collaboration. The accuracy of this determination depends on several factors:

The quality and capabilities of the selected Ollama model
The clarity of the prompt provided to the model
The content and structure of the posts being analyzed

If you need higher accuracy or want to validate the results:

Consider using a more capable model (e.g., mixtral-8x22b, gemma2:27b, or mistral-large)
Review the generated JSON output manually
Adjust the prompt in the analyzePostWithOllama function to be more specific to your use case
Implement a secondary validation step that re-analyzes posts with low confidence scores

smithjw1/blog-relevancy