Umakantamaharana/multi-agent-systems-for-rheumatoid-arthritis-diagnosis
Right Prediction, Wrong Reasoning: Uncovering LLM Misalignment in RA Disease Diagnosis
Multi-Agent Systems for Rheumatoid Arthritis Diagnosis (RPWR)
Right Prediction, Wrong Reasoning: Uncovering LLM Misalignment in RA Disease Diagnosis
Project Website: https://umakantamaharana.github.io/rpwr.github.io/
arXiv Paper: arXiv:2504.06581v1 [cs.AI]
Authors
- Umakanta Maharana - RespAI Lab, KIIT Bhubaneswar
- Sarthak Verma - KIMS Bhubaneswar
- Avarna Agarwal - KIMS Bhubaneswar
- Prakashini Mruthyunjaya - KIMS Bhubaneswar
- Dwarikanath Mahapatra - Monash University, Australia
- Sakir Ahmed - KIMS Bhubaneswar
- Murari Mandal - RespAI Lab, KIIT Bhubaneswar
Correspondence: Murari Mandal
๐ฏ Project Overview
This research project explores multi-agent LLM systems for medical diagnosis, specifically focused on Rheumatoid Arthritis (RA) screening and diagnosis. The study investigates the phenomenon of "Right Prediction, Wrong Reasoning" in LLM-based medical diagnosis systems.
Key Highlights
- Dataset: PreRAID, comprising 160 patient records from KIMS, Bhubaneswar
- Diagnosis Accuracy: LLMs predicted RA with 95% accuracy
- Reasoning Validation: Expert review revealed 68% flawed reasoning despite correct predictions
- Implications: Highlights the critical need for reliable reasoning in clinical AI tools
This project implements and evaluates different multi-agent architectures using Large Language Models (LLMs) to diagnose Rheumatoid Arthritis from patient symptom data, exploring various agent configurations with and without knowledge base integration.
Key Features
- Multiple Agent Architectures: Single agent, dual agent, and three-agent systems
- Knowledge Base Integration: RAG (Retrieval-Augmented Generation) using ChromaDB
- Multi-Model Support: Compatible with OpenAI (GPT-4, O1, O3-mini), Google Gemini (2.0, 2.5), and local models via Ollama (DeepSeek, Qwen)
- Comprehensive Evaluation: Automated testing and result analysis across different model configurations
๐ Project Structure
multi-agent-systems-for-rheumatoid-arthritis-diagnosis/
โโโ .env # Environment variables (API keys)
โโโ .env.example # Template for environment setup
โโโ .gitignore # Git ignore rules
โโโ README.md # This file
โโโ requirements.txt # Python dependencies
โโโ run.sh # Main execution script
โโโ data/ # Dataset files
โ โโโ knowledge_base_280.csv # Medical knowledge base
โ โโโ preprocessed_data_350.csv # Preprocessed patient data
โ โโโ test_70.csv # Test dataset
โ โโโ test_data.csv # Additional test data
โโโ knowledge_base/ # ChromaDB vector database
โ โโโ chroma.sqlite3
โ โโโ f0eede21-c8e5-4813-91e1-93fb19985e5d/
โโโ notebooks/ # Jupyter notebooks for analysis
โ โโโ data_processing.ipynb # Data cleaning and processing
โโโ results/ # Experiment results
โ โโโ agent_kb/ # Results with knowledge base
โ โโโ agent_wkb/ # Results without knowledge base
โ โโโ two_agent_kb/ # Two-agent system results
โโโ scripts/ # Python scripts
โโโ agent_without_kb.py # Single agent without KB
โโโ dataset.py # Dataset utilities
โโโ one_agent_with_kb.py # Single agent with KB
โโโ three_agent_with_kb.py # Three-agent system
โโโ two_agent_with_kb.py # Two-agent system
๐ Setup
Prerequisites
- Python 3.8 or higher
- Virtual environment (recommended)
- API keys for OpenAI and/or Google Gemini (or Ollama for local models)
Installation
-
Clone or navigate to the repository:
cd multi-agent-systems-for-rheumatoid-arthritis-diagnosis -
Create and activate a virtual environment:
# Linux/Mac python3 -m venv venv source venv/bin/activate # Windows python -m venv venv venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Configure environment variables:
cp .env.example .env # Edit .env and add your API keys
๐ป Usage
Running Individual Scripts
1. Agent Without Knowledge Base
python scripts/agent_without_kb.py --provider GOOGLE --model gemini-2.0-flash --results_dir results/agent_wkb2. Agent With Knowledge Base
python scripts/one_agent_with_kb.py --provider GOOGLE --model gemini-2.5-pro-preview-03-25 --results_dir results/agent_kb3. Two-Agent System
python scripts/two_agent_with_kb.py --provider OPENAI --model o1 --results_dir results/two_agent_kbUsing the Run Script
The run.sh script provides a convenient way to run multiple experiments:
chmod +x run.sh
./run.shEdit run.sh to configure which models and providers to test.
Supported Providers and Models
| Provider | Models | Notes |
|---|---|---|
| OPENAI | o1, o3-mini, gpt-4 |
Requires OpenAI API key |
gemini-2.0-flash, gemini-2.5-pro-preview-03-25 |
Requires Google API key | |
| OLLAMA | deepseek-r1:70b, qwq |
Requires local Ollama installation |
๐๏ธ Agent Architectures
1. Single Agent Without Knowledge Base
- Direct diagnosis from patient symptoms
- No external medical knowledge integration
- Baseline for comparison
2. Single Agent With Knowledge Base
- RAG-enhanced diagnosis
- Retrieves relevant medical information from ChromaDB
- Improved accuracy with domain knowledge
3. Two-Agent System
- Agent 1: Diagnosis agent
- Agent 2: Reasoning agent
- Collaborative decision-making process
4. Three-Agent System
- Extended multi-agent collaboration
- More complex reasoning chains
- (Implementation in progress)
๐ Results
Results are saved in CSV format in the results/ directory, organized by agent architecture and model:
results/
โโโ agent_kb/
โ โโโ gemini-2.0-flash.csv
โ โโโ o1.csv
โโโ agent_wkb/
โ โโโ gemini-2.5-pro-preview-03-25.csv
โ โโโ deepseek-r1:70b.csv
โโโ two_agent_kb/
โโโ o3-mini.csv
Each CSV contains:
- Patient symptoms
- Predicted diagnosis
- Reasoning/explanation
- Actual diagnosis (ground truth)
- Model metadata
๐ฌ Research Context
This project investigates the phenomenon of "Right Prediction, Wrong Reasoning" in LLM-based medical diagnosis systems. Key research questions include:
- How do different agent architectures affect diagnostic accuracy?
- Does knowledge base integration improve reasoning quality?
- Can multi-agent systems provide better explanations?
- How do different LLM models compare in medical reasoning tasks?
๐ Dependencies
Key dependencies (see requirements.txt for full list):
langchain-chroma: Vector database for knowledge baselangchain-openai: OpenAI model integrationlangchain-google-genai: Google Gemini integrationlangchain-ollama: Local model supportlanggraph: Multi-agent orchestrationpandas: Data processingscikit-learn: ML utilitiesmatplotlib: Visualization
๐ค Contributing
This is a research project. If you'd like to contribute:
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
๐ Citation
If you use the PreRAID dataset or this codebase in your research, please cite our paper:
@misc{maharana2025rightpredictionwrongreasoning,
title={Right Prediction, Wrong Reasoning: Uncovering LLM Misalignment in RA Disease Diagnosis},
author={Umakanta Maharana and Sarthak Verma and Avarna Agarwal and Prakashini Mruthyunjaya and Dwarikanath Mahapatra and Sakir Ahmed and Murari Mandal},
year={2025},
eprint={2504.06581},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2504.06581},
}๐ง Contact
For questions or collaborations, please contact:
- Murari Mandal: murarimandal.github.io
- Umakanta Maharana: umakantamaharana.github.io
๐ Acknowledgments
This research is supported by the Science and Engineering Research Board (SERB), India under Grant SRG/2023/001686.
This project uses:
- LangChain for LLM orchestration
- ChromaDB for vector storage
- OpenAI, Google, and Ollama for LLM inference
Parts of the project website were adopted from the Nerfies page.
๐ License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Note: This is a research project for educational and experimental purposes. It should not be used for actual medical diagnosis without proper validation and regulatory approval.
