GitHunt
UM

Umakantamaharana/multi-agent-systems-for-rheumatoid-arthritis-diagnosis

Right Prediction, Wrong Reasoning: Uncovering LLM Misalignment in RA Disease Diagnosis

Multi-Agent Systems for Rheumatoid Arthritis Diagnosis (RPWR)

Right Prediction, Wrong Reasoning: Uncovering LLM Misalignment in RA Disease Diagnosis

Project Website: https://umakantamaharana.github.io/rpwr.github.io/
arXiv Paper: arXiv:2504.06581v1 [cs.AI]

Authors

  • Umakanta Maharana - RespAI Lab, KIIT Bhubaneswar
  • Sarthak Verma - KIMS Bhubaneswar
  • Avarna Agarwal - KIMS Bhubaneswar
  • Prakashini Mruthyunjaya - KIMS Bhubaneswar
  • Dwarikanath Mahapatra - Monash University, Australia
  • Sakir Ahmed - KIMS Bhubaneswar
  • Murari Mandal - RespAI Lab, KIIT Bhubaneswar

Correspondence: Murari Mandal


๐ŸŽฏ Project Overview

This research project explores multi-agent LLM systems for medical diagnosis, specifically focused on Rheumatoid Arthritis (RA) screening and diagnosis. The study investigates the phenomenon of "Right Prediction, Wrong Reasoning" in LLM-based medical diagnosis systems.

Key Highlights

  • Dataset: PreRAID, comprising 160 patient records from KIMS, Bhubaneswar
  • Diagnosis Accuracy: LLMs predicted RA with 95% accuracy
  • Reasoning Validation: Expert review revealed 68% flawed reasoning despite correct predictions
  • Implications: Highlights the critical need for reliable reasoning in clinical AI tools

This project implements and evaluates different multi-agent architectures using Large Language Models (LLMs) to diagnose Rheumatoid Arthritis from patient symptom data, exploring various agent configurations with and without knowledge base integration.

Key Features

  • Multiple Agent Architectures: Single agent, dual agent, and three-agent systems
  • Knowledge Base Integration: RAG (Retrieval-Augmented Generation) using ChromaDB
  • Multi-Model Support: Compatible with OpenAI (GPT-4, O1, O3-mini), Google Gemini (2.0, 2.5), and local models via Ollama (DeepSeek, Qwen)
  • Comprehensive Evaluation: Automated testing and result analysis across different model configurations

๐Ÿ“ Project Structure

multi-agent-systems-for-rheumatoid-arthritis-diagnosis/
โ”œโ”€โ”€ .env                    # Environment variables (API keys)
โ”œโ”€โ”€ .env.example           # Template for environment setup
โ”œโ”€โ”€ .gitignore             # Git ignore rules
โ”œโ”€โ”€ README.md              # This file
โ”œโ”€โ”€ requirements.txt       # Python dependencies
โ”œโ”€โ”€ run.sh                 # Main execution script
โ”œโ”€โ”€ data/                  # Dataset files
โ”‚   โ”œโ”€โ”€ knowledge_base_280.csv      # Medical knowledge base
โ”‚   โ”œโ”€โ”€ preprocessed_data_350.csv   # Preprocessed patient data
โ”‚   โ”œโ”€โ”€ test_70.csv                 # Test dataset
โ”‚   โ””โ”€โ”€ test_data.csv               # Additional test data
โ”œโ”€โ”€ knowledge_base/        # ChromaDB vector database
โ”‚   โ”œโ”€โ”€ chroma.sqlite3
โ”‚   โ””โ”€โ”€ f0eede21-c8e5-4813-91e1-93fb19985e5d/
โ”œโ”€โ”€ notebooks/             # Jupyter notebooks for analysis
โ”‚   โ””โ”€โ”€ data_processing.ipynb       # Data cleaning and processing
โ”œโ”€โ”€ results/               # Experiment results
โ”‚   โ”œโ”€โ”€ agent_kb/          # Results with knowledge base
โ”‚   โ”œโ”€โ”€ agent_wkb/         # Results without knowledge base
โ”‚   โ””โ”€โ”€ two_agent_kb/      # Two-agent system results
โ””โ”€โ”€ scripts/               # Python scripts
    โ”œโ”€โ”€ agent_without_kb.py         # Single agent without KB
    โ”œโ”€โ”€ dataset.py                  # Dataset utilities
    โ”œโ”€โ”€ one_agent_with_kb.py        # Single agent with KB
    โ”œโ”€โ”€ three_agent_with_kb.py      # Three-agent system
    โ””โ”€โ”€ two_agent_with_kb.py        # Two-agent system

๐Ÿš€ Setup

Prerequisites

  • Python 3.8 or higher
  • Virtual environment (recommended)
  • API keys for OpenAI and/or Google Gemini (or Ollama for local models)

Installation

  1. Clone or navigate to the repository:

    cd multi-agent-systems-for-rheumatoid-arthritis-diagnosis
  2. Create and activate a virtual environment:

    # Linux/Mac
    python3 -m venv venv
    source venv/bin/activate
    
    # Windows
    python -m venv venv
    venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Configure environment variables:

    cp .env.example .env
    # Edit .env and add your API keys

๐Ÿ’ป Usage

Running Individual Scripts

1. Agent Without Knowledge Base

python scripts/agent_without_kb.py --provider GOOGLE --model gemini-2.0-flash --results_dir results/agent_wkb

2. Agent With Knowledge Base

python scripts/one_agent_with_kb.py --provider GOOGLE --model gemini-2.5-pro-preview-03-25 --results_dir results/agent_kb

3. Two-Agent System

python scripts/two_agent_with_kb.py --provider OPENAI --model o1 --results_dir results/two_agent_kb

Using the Run Script

The run.sh script provides a convenient way to run multiple experiments:

chmod +x run.sh
./run.sh

Edit run.sh to configure which models and providers to test.

Supported Providers and Models

Provider Models Notes
OPENAI o1, o3-mini, gpt-4 Requires OpenAI API key
GOOGLE gemini-2.0-flash, gemini-2.5-pro-preview-03-25 Requires Google API key
OLLAMA deepseek-r1:70b, qwq Requires local Ollama installation

๐Ÿ—๏ธ Agent Architectures

1. Single Agent Without Knowledge Base

  • Direct diagnosis from patient symptoms
  • No external medical knowledge integration
  • Baseline for comparison

2. Single Agent With Knowledge Base

  • RAG-enhanced diagnosis
  • Retrieves relevant medical information from ChromaDB
  • Improved accuracy with domain knowledge

3. Two-Agent System

  • Agent 1: Diagnosis agent
  • Agent 2: Reasoning agent
  • Collaborative decision-making process

4. Three-Agent System

  • Extended multi-agent collaboration
  • More complex reasoning chains
  • (Implementation in progress)

๐Ÿ“Š Results

Results are saved in CSV format in the results/ directory, organized by agent architecture and model:

results/
โ”œโ”€โ”€ agent_kb/
โ”‚   โ”œโ”€โ”€ gemini-2.0-flash.csv
โ”‚   โ””โ”€โ”€ o1.csv
โ”œโ”€โ”€ agent_wkb/
โ”‚   โ”œโ”€โ”€ gemini-2.5-pro-preview-03-25.csv
โ”‚   โ””โ”€โ”€ deepseek-r1:70b.csv
โ””โ”€โ”€ two_agent_kb/
    โ””โ”€โ”€ o3-mini.csv

Each CSV contains:

  • Patient symptoms
  • Predicted diagnosis
  • Reasoning/explanation
  • Actual diagnosis (ground truth)
  • Model metadata

๐Ÿ”ฌ Research Context

This project investigates the phenomenon of "Right Prediction, Wrong Reasoning" in LLM-based medical diagnosis systems. Key research questions include:

  • How do different agent architectures affect diagnostic accuracy?
  • Does knowledge base integration improve reasoning quality?
  • Can multi-agent systems provide better explanations?
  • How do different LLM models compare in medical reasoning tasks?

๐Ÿ“ Dependencies

Key dependencies (see requirements.txt for full list):

  • langchain-chroma: Vector database for knowledge base
  • langchain-openai: OpenAI model integration
  • langchain-google-genai: Google Gemini integration
  • langchain-ollama: Local model support
  • langgraph: Multi-agent orchestration
  • pandas: Data processing
  • scikit-learn: ML utilities
  • matplotlib: Visualization

๐Ÿค Contributing

This is a research project. If you'd like to contribute:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Submit a pull request

๐Ÿ“„ Citation

If you use the PreRAID dataset or this codebase in your research, please cite our paper:

@misc{maharana2025rightpredictionwrongreasoning,
      title={Right Prediction, Wrong Reasoning: Uncovering LLM Misalignment in RA Disease Diagnosis}, 
      author={Umakanta Maharana and Sarthak Verma and Avarna Agarwal and Prakashini Mruthyunjaya and Dwarikanath Mahapatra and Sakir Ahmed and Murari Mandal},
      year={2025},
      eprint={2504.06581},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2504.06581}, 
}

๐Ÿ“ง Contact

For questions or collaborations, please contact:

๐Ÿ™ Acknowledgments

This research is supported by the Science and Engineering Research Board (SERB), India under Grant SRG/2023/001686.

This project uses:

  • LangChain for LLM orchestration
  • ChromaDB for vector storage
  • OpenAI, Google, and Ollama for LLM inference

Parts of the project website were adopted from the Nerfies page.

๐Ÿ“œ License

Creative Commons License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.


Note: This is a research project for educational and experimental purposes. It should not be used for actual medical diagnosis without proper validation and regulatory approval.