GitHunt
SM

smithjw1/blog-relevancy

WordPress XML Processor for Real-Time Collaboration Mentions

This script processes a WordPress XML export file to identify posts that discuss real-time collaboration, multiplayer features, block-level locking, collaborative editing, or similar concepts. It uses a locally running Ollama instance to analyze post content and outputs the findings to a JSON file.

Prerequisites

  • Node.js (v18.x or later recommended)
  • A locally running Ollama instance with a suitable model pulled (e.g., ollama pull llama3)
  • A WordPress XML export file.

Setup

  1. Clone or download this script into a directory.
  2. Place your WordPress XML export file (e.g., your_export.xml) in the root of this directory.
  3. Install the necessary dependencies:
    npm install

Configuration

Before running the script, you might need to adjust the following constants within process_wordpress_export.js:

  • OLLAMA_MODEL: Specify the Ollama model you want to use (e.g., 'llama3').
  • OUTPUT_JSON_FILE: The name for the intermediate JSON output.

Usage

To run the script, provide the path to your WordPress XML export file as a command-line argument:

node process_wordpress_export.js /path/to/your/wordpress_export.xml

The script will read the specified XML file, analyze each post using Ollama, and save the posts identified as relevant into the rtc_posts.json file.

Output

  • A JSON file (default: rtc_posts.json) containing structured data of posts identified as relevant.

Validation and Accuracy

The script relies on the Ollama AI model to determine if a post is relevant to real-time collaboration. The accuracy of this determination depends on several factors:

  • The quality and capabilities of the selected Ollama model
  • The clarity of the prompt provided to the model
  • The content and structure of the posts being analyzed

If you need higher accuracy or want to validate the results:

  1. Consider using a more capable model (e.g., mixtral-8x22b, gemma2:27b, or mistral-large)
  2. Review the generated JSON output manually
  3. Adjust the prompt in the analyzePostWithOllama function to be more specific to your use case
  4. Implement a secondary validation step that re-analyzes posts with low confidence scores