MaxMax2016/rag-chat
A production-ready RAG (Retrieval Augmented Generation) system for chatting with your documents
Xantus - Private RAG Chat System with MCP Integration
A production-ready RAG (Retrieval Augmented Generation) system for chatting with your documents
Built with privacy in mind • Extensible via MCP • Multi-provider AI support
Features • Quick Start • Architecture • MCP Integration • Configuration • API Reference
Table of Contents
- Overview
- Features
- Architecture
- Quick Start
- MCP Integration
- Configuration
- Usage
- Development
- Troubleshooting
- FAQ
- License
Overview
Xantus is a privacy-first RAG system that lets you chat with your documents using AI. Unlike cloud-only solutions, Xantus can run completely locally or use cloud providers - your choice.
What Makes Xantus Different?
- Privacy-First: All data stays on your system with local AI
- Extensible: MCP (Model Context Protocol) integration for external tools
- Multiple UIs: Streamlit interface + OpenAI-compatible API
- Multi-Provider: Supports Ollama, OpenAI, Anthropic, and more
- Modular: Swap LLMs, embeddings, vector stores easily
- Production-Ready: Dependency injection, proper error handling, logging
Features
Core Features
- Document Chat: Upload PDFs, DOCX, TXT, Markdown and chat with them
- Semantic Search: RAG-powered retrieval with ChromaDB or Qdrant
- Multiple Interfaces:
- Clean Streamlit UI for end users
- RESTful API for integration
- Python SDK for developers
- Flexible AI Backends:
- Local: Ollama (privacy-first)
- Cloud: OpenAI, Anthropic
- Hybrid: Cloud LLM + local embeddings
Advanced Features
- MCP Integration: Connect external tools (calculator, file system, databases)
- ⚙Configurable: YAML + environment variables
- Multiple Vector Stores: ChromaDB, Qdrant
- RAG Tuning: Adjust chunk size, overlap, top-k retrieval
- Secure: API key management via environment variables
- Scalable: Async API with proper dependency injection
Architecture
Xantus is built on a modern, modular architecture:
┌─────────────────────────────────────────────────────────┐
│ User │
└────────────┬────────────────────────────┬───────────────┘
│ │
┌────────▼────────┐ ┌────────▼─────────┐
│ Streamlit UI │ │ API Clients │
│ (Port 8501) │ │ (curl, SDK) │
└────────┬────────┘ └────────┬─────────┘
│ │
└────────────┬───────────────┘
│
┌────────▼─────────┐
│ FastAPI Server │
│ (Port 8000) │
└────────┬─────────┘
│
┌───────────────┼───────────────┐
│ │ │
┌──────▼──────┐ ┌──────▼───────┐ ┌────▼─────┐
│ Chat Service│ │Ingest Service│ │ MCP │
└──────┬──────┘ └──────┬───────┘ │ Service │
│ │ └────┬─────┘
│ │ │
┌──────▼───────────────▼──────────────▼─────┐
│ Dependency Injection Container │
│ (LLM • Embeddings • Vector Store • MCP) │
└────────────────────┬──────────────────────┘
│
┌───────────────┼───────────────┐
│ │ │
┌────▼────┐ ┌─────▼─────┐ ┌───▼────┐
│ LLM │ │ Embeddings│ │ Vector │
│Provider │ │ Provider │ │ Store │
└─────────┘ └───────────┘ └────────┘
│Ollama │ │HuggingFace│ │Chroma │
│OpenAI │ │ Ollama │ │Qdrant │
│Anthropic│ │ OpenAI │ └────────┘
└─────────┘ └───────────┘
┌──────────┐
│MCP Server│
│TypeScript│
└──────────┘
│Calculator│
│FileSystem│
│TextProc │
└──────────┘
Technology Stack
| Component | Technology | Purpose |
|---|---|---|
| Backend | FastAPI + Python 3.10+ | High-performance async API |
| RAG Framework | LlamaIndex | Document indexing & retrieval |
| UI | Streamlit | User-friendly chat interface |
| Configuration | Pydantic + YAML | Type-safe settings |
| DI | Injector | Clean dependency injection |
| Vector DB | ChromaDB / Qdrant | Semantic search |
| MCP | Model Context Protocol | External tool integration |
Project Structure
xantus/
├── .env.example # Environment variable template
├── .gitignore # Git ignore patterns
├── config.yaml # Main configuration file
├── requirements.txt # Python dependencies
├── setup_mcp.sh # MCP setup automation
├── start_api.sh # API server startup script
├── start_ui.sh # UI startup script
│
├── xantus/ # Main application package
│ ├── __init__.py
│ ├── main.py # FastAPI application entry
│ ├── container.py # Dependency injection setup
│ │
│ ├── api/ # API endpoints
│ │ ├── chat_router.py # /v1/chat/completions
│ │ ├── ingest_router.py # /v1/ingest/*
│ │ └── embeddings_router.py # /v1/embeddings
│ │
│ ├── services/ # Business logic
│ │ ├── chat_service.py # RAG-powered chat
│ │ ├── ingest_service.py # Document processing
│ │ └── mcp_service.py # MCP tool orchestration
│ │
│ ├── components/ # Component factories
│ │ ├── llm/
│ │ │ └── llm_factory.py # LLM provider factory
│ │ ├── embeddings/
│ │ │ └── embedding_factory.py
│ │ └── vector_store/
│ │ └── vector_store_factory.py
│ │
│ ├── models/ # Data models
│ │ └── schemas.py # Pydantic request/response models
│ │
│ └── config/ # Configuration
│ └── settings.py # Settings management with Pydantic
│
├── ui/ # User interface
│ └── streamlit_app.py # Streamlit chat application
│
├── mcp-servers/ # MCP integration (git submodules)
│ └── mcp-starter-template-ts/ # TypeScript MCP server
│ ├── dist/ # Compiled JavaScript
│ │ └── start.js # Entry point
│ └── src/ # TypeScript source
│ └── tools/ # Tool implementations
│
├── data/ # Data directory (gitignored)
│ └── vector_store/ # Persisted vector embeddings
│
└── docs/ # Documentation
├── MCP_INTEGRATION.md # MCP technical guide
├── README_MCP.md # MCP quick start
└── SETUP_COMPLETE.md # Setup summary
Quick Start
Prerequisites
- Python 3.10+ (Check:
python --version) - Node.js 18+ (For MCP integration, check:
node --version) - Git (For cloning submodules)
- (Optional) Ollama (For local AI)
Installation
Step 1: Clone the Repository
# Clone with MCP submodules
git clone --recurse-submodules https://github.com/onamfc/rag-chat
cd xantus
# OR if you already cloned without submodules:
git submodule update --init --recursiveStep 2: Create Virtual Environment
# Create virtual environment
python -m venv venv
# Activate it
source venv/bin/activate # Linux/Mac
# OR
venv\Scripts\activate # WindowsStep 3: Install Python Dependencies
pip install -r requirements.txtStep 4: Setup MCP (Optional but Recommended)
# This will:
# - Initialize MCP submodules
# - Install npm dependencies
# - Build TypeScript MCP server
./setup_mcp.shStep 5: Configure Environment Variables
# Copy the example file
cp .env.example .env
# Edit .env and add your API keys (if using cloud providers)
# For Anthropic:
XANTUS_LLM__API_KEY=sk-ant-api03-your-key-here
# For OpenAI:
# XANTUS_LLM__API_KEY=sk-your-openai-key-hereStep 6: Configure Settings
Edit config.yaml to choose your providers:
Option A: Completely Local (Privacy-First)
llm:
provider: ollama
model: llama3.2
embedding:
provider: huggingface
model: BAAI/bge-small-en-v1.5
mcp:
enabled: true # Enable MCP toolsOption B: Cloud-Powered (Anthropic)
llm:
provider: anthropic
model: claude-sonnet-4-20250514
api_key: null # Read from .env
embedding:
provider: huggingface # Keep embeddings local
model: BAAI/bge-small-en-v1.5
mcp:
enabled: trueOption C: OpenAI
llm:
provider: openai
model: gpt-4
api_key: null # Read from .env
embedding:
provider: openai
model: text-embedding-3-small
api_key: nullFirst Run
Start the API Server
# Option 1: Using the startup script
./start_api.sh
# Option 2: Manual start
python -m xantus.main
# The API will be available at http://localhost:8000
# API docs at http://localhost:8000/docsYou should see:
INFO - Starting Xantus application...
INFO - Loaded settings with LLM provider: anthropic
INFO - Dependency injection container initialized
INFO - Starting server on 127.0.0.1:8000
With MCP enabled, you'll also see:
INFO - Starting MCP server 'mcp-starter-template': node mcp-servers/...
INFO - Loaded 4 tools from 'mcp-starter-template': ['calculate', 'filesystem', 'text-processing', 'weather']
Start the UI (In a New Terminal)
# Activate venv again
source venv/bin/activate
# Start Streamlit
streamlit run ui/streamlit_app.py
# The UI will open in your browser at http://localhost:8501Upload a Document and Chat!
- Click "Upload Document" in the sidebar
- Select a PDF, TXT, DOCX, or Markdown file
- Wait for processing (you'll see the progress)
- Ask questions about your document!
Example Questions:
- "What is the main topic of this document?"
- "Summarize the key points"
- "Calculate the total revenue mentioned in section 3" (uses MCP calculator)
- "Compare this with the file in ../reports/2023.pdf" (uses MCP filesystem)
MCP Integration
MCP (Model Context Protocol) allows Claude to use external tools while answering questions.
What Tools Are Available?
Your TypeScript MCP server (in mcp-servers/mcp-starter-template-ts/) provides:
| Tool | Function | Example Use |
|---|---|---|
| Calculator | Mathematical operations | "Calculate the sum of Q1-Q4 revenues" |
| File System | Read/write/list files | "Compare with last year's report in ../reports/" |
| Text Processing | Word count, sentiment, case conversion | "Analyze sentiment of customer feedback" |
| Weather | Weather data (mock) | "Check weather for event planning" |
MCP Architecture
User Question
↓
Xantus retrieves document context (RAG)
↓
Sends to Claude with available MCP tools
↓
Claude decides to use a tool (e.g., calculator)
↓
Xantus forwards tool call to MCP server (TypeScript)
↓
MCP server executes tool and returns result
↓
Claude incorporates result into answer
↓
User gets comprehensive response
Enabling/Disabling MCP
In config.yaml:
mcp:
enabled: true # Set to false to disable MCP
servers:
- name: "mcp-starter-template"
command: "node"
args: ["mcp-servers/mcp-starter-template-ts/dist/start.js"]Adding More MCP Servers
You can connect multiple MCP servers:
mcp:
enabled: true
servers:
# Your custom tools
- name: "my-tools"
command: "node"
args: ["mcp-servers/mcp-starter-template-ts/dist/start.js"]
# Database access
- name: "postgres"
command: "npx"
args: ["-y", "@modelcontextprotocol/server-postgres", "postgresql://localhost/mydb"]
# Web search
- name: "brave-search"
command: "npx"
args: ["-y", "@modelcontextprotocol/server-brave-search"]MCP Documentation
For complete MCP setup and customization:
- Quick Start:
README_MCP.md
Configuration
Environment Variables
Create a .env file in the project root:
# ===== LLM API Keys =====
# For Anthropic (double underscore for nested config!)
XANTUS_LLM__API_KEY=sk-ant-api03-your-key-here
# For OpenAI
# XANTUS_LLM__API_KEY=sk-your-openai-key-here
# ===== Embedding API Keys (optional) =====
# XANTUS_EMBEDDING__API_KEY=sk-your-key-here
# ===== Override Other Settings =====
# Format: XANTUS_<SECTION>__<KEY>=value
# Examples:
# XANTUS_LLM__TEMPERATURE=0.5
# XANTUS_RAG__SIMILARITY_TOP_K=10
# XANTUS_SERVER__PORT=8001Important: Use double underscore (__) for nested configuration!
Provider Setup
Local Setup with Ollama
-
Install Ollama: https://ollama.com/download
-
Start Ollama:
ollama serve
-
Pull Models:
ollama pull llama3.2 # For chat ollama pull nomic-embed-text # For embeddings
-
Configure
config.yaml:llm: provider: ollama model: llama3.2 api_base: http://localhost:11434 # Default embedding: provider: ollama model: nomic-embed-text
Anthropic Setup
-
Get API Key: https://console.anthropic.com/
-
Add to
.env:XANTUS_LLM__API_KEY=sk-ant-api03-your-key-here
-
Configure
config.yaml:llm: provider: anthropic model: claude-sonnet-4-20250514 api_key: null # Read from environment temperature: 0.7 max_tokens: 4096 embedding: provider: huggingface # Use local for cost savings model: BAAI/bge-small-en-v1.5
OpenAI Setup
-
Get API Key: https://platform.openai.com/api-keys
-
Add to
.env:XANTUS_LLM__API_KEY=sk-your-openai-key-here
-
Configure
config.yaml:llm: provider: openai model: gpt-4-turbo-preview api_key: null embedding: provider: openai model: text-embedding-3-small api_key: null
RAG Tuning
Fine-tune retrieval in config.yaml:
rag:
# Number of relevant chunks to retrieve
similarity_top_k: 5
# Size of text chunks (characters)
chunk_size: 1024
# Overlap between chunks (prevents context loss)
chunk_overlap: 200
# Enable advanced reranking (requires additional setup)
enable_reranking: falseTuning Guidelines:
- Larger chunks (1024-2048): Better for long-form content
- Smaller chunks (512-1024): Better for specific facts
- Higher top_k (8-10): More context but slower
- Lower top_k (3-5): Faster but may miss context
- Overlap: 15-20% of chunk_size is recommended
Vector Store Configuration
vector_store:
provider: chroma # or qdrant
# Path to persist vector data
persist_path: ./data/vector_store
# Collection name
collection_name: xantus_documentsServer Configuration
server:
host: 127.0.0.1 # Change to 0.0.0.0 for network access
port: 8000
# CORS settings
cors_enabled: true
cors_origins:
- "*" # Be more restrictive in production!Usage
Streamlit UI
The easiest way to use Xantus:
-
Start the API (terminal 1):
./start_api.sh
-
Start the UI (terminal 2):
./start_ui.sh # OR streamlit run ui/streamlit_app.py -
Navigate to http://localhost:8501
-
Upload documents via the sidebar
-
Chat with your documents!
Features:
- ✅ Document upload with progress
- ✅ Document management (list/delete)
- ✅ Chat history
- ✅ Context toggle (use RAG or not)
- ✅ Health monitoring
API Endpoints
Health Check
curl http://localhost:8000/healthResponse:
{
"status": "healthy",
"version": "0.1.0",
"components": {
"llm": "anthropic",
"embedding": "huggingface",
"vector_store": "chroma"
}
}Chat Completion (with RAG)
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "What are the main findings in the report?"}
],
"use_context": true,
"stream": false
}'Response:
{
"id": "chat-123abc",
"object": "chat.completion",
"created": 1730000000,
"model": "claude-sonnet-4-20250514",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Based on the documents, the main findings are..."
},
"finish_reason": "stop"
}]
}Upload Document
curl -X POST http://localhost:8000/v1/ingest/file \
-F "file=@/path/to/document.pdf"Response:
{
"status": "success",
"document_id": "doc_abc123",
"chunks_created": 42
}List Documents
curl http://localhost:8000/v1/ingest/documentsDelete Document
curl -X DELETE http://localhost:8000/v1/ingest/documents/doc_abc123Generate Embeddings
curl -X POST http://localhost:8000/v1/embeddings \
-H "Content-Type: application/json" \
-d '{"input": "Text to embed", "model": "default"}'Python Client
import requests
# Start a session
session = requests.Session()
api_url = "http://localhost:8000"
# Upload a document
with open("document.pdf", "rb") as f:
response = session.post(
f"{api_url}/v1/ingest/file",
files={"file": f}
)
print(f"Uploaded: {response.json()}")
# Chat with RAG
response = session.post(
f"{api_url}/v1/chat/completions",
json={
"messages": [
{"role": "user", "content": "Summarize the key points"}
],
"use_context": True,
"stream": False
}
)
result = response.json()
print(result["choices"][0]["message"]["content"])Development
Project Philosophy
- Privacy First: Default to local, support cloud
- Modularity: Easy to swap any component
- Simplicity: Minimal abstractions
- Type Safety: Pydantic everywhere
- Production Ready: Proper DI, error handling, logging
Adding a New LLM Provider
-
Add to settings (
xantus/config/settings.py):provider: Literal["ollama", "openai", "anthropic", "your-provider"]
-
Implement factory (
xantus/components/llm/llm_factory.py):def _create_your_provider_llm(config: LLMConfig) -> LLM: return YourProviderLLM( model=config.model, api_key=config.api_key, temperature=config.temperature )
-
Update factory dispatch:
elif config.provider == "your-provider": return _create_your_provider_llm(config)
Adding a New Vector Store
Similar process in xantus/components/vector_store/vector_store_factory.py
Code Style
# Format code
black xantus/
# Lint
ruff check xantus/
# Type check
mypy xantus/Testing
# Install test dependencies
pip install pytest pytest-asyncio
# Run tests
pytest tests/Troubleshooting
Common Issues
1. "Cannot connect to Ollama"
Solution: Ensure Ollama is running
ollama serve2. "ValueError: Anthropic API key is required"
Solution: Check your .env file:
# Correct (double underscore!):
XANTUS_LLM__API_KEY=sk-ant-...
# Wrong (single underscore):
XANTUS_LLM_API_KEY=sk-ant-...3. "Import error: No module named 'xantus'"
Solution: Ensure you're in the right directory
cd xantus
python -c "import xantus; print('OK')"4. "MCP server not starting"
Solution: Build the MCP server
./setup_mcp.sh
# OR manually:
cd mcp-servers/mcp-starter-template-ts
npm install
npm run build5. "Port 8000 already in use"
Solution: Kill existing processes or change port
# Kill existing
pkill -f "python.*xantus"
# OR change port in config.yaml:
server:
port: 80016. "Vector store errors"
Solution: Clear and recreate
rm -rf data/vector_store
mkdir -p data/vector_store
# Restart server, re-upload documentsDebug Mode
Enable verbose logging:
# In xantus/main.py
import logging
logging.basicConfig(level=logging.DEBUG)FAQ
Q: Does my data leave my machine?
A: Only if you use cloud providers (OpenAI/Anthropic). With Ollama + HuggingFace, everything stays local.
Q: Which is faster - local or cloud?
A: Cloud (OpenAI/Anthropic) is usually faster. Local (Ollama) depends on your hardware.
Q: Can I use multiple documents?
A: Yes! Upload as many as you want. They're all indexed in the vector store.
Q: What's the maximum document size?
A: No hard limit, but larger documents take longer to process.
Q: Can I delete documents?
A: Yes, via the API /v1/ingest/documents/{doc_id} or Streamlit UI.
Q: Is streaming supported?
A: Yes! Set "stream": true in chat completion requests.
Q: What LLM is best?
A:
- Best quality: Claude Sonnet 4, GPT-4
- Best local: Llama 3.2, Mistral
- Best balance: Claude Haiku, GPT-3.5-turbo
Q: How do I add authentication?
A: Add FastAPI middleware in xantus/main.py for API key or OAuth.
Additional Resources
- MCP Quick Start:
README_MCP.md - API Documentation: http://localhost:8000/docs (when running)
- LlamaIndex Docs: https://docs.llamaindex.ai/
- FastAPI Docs: https://fastapi.tiangolo.com/
- Streamlit Docs: https://docs.streamlit.io/
Contributing
Contributions are welcome! This project is designed to be:
- Easy to understand
- Simple to extend
- Well-documented
Feel free to:
- Add new providers
- Improve the UI
- Enhance MCP tools
- Fix bugs
- Improve documentation
License
This project is provided as-is for educational and research purposes.
Built with:
- FastAPI - Modern async web framework
- LlamaIndex - RAG framework
- Streamlit - Data apps framework
- ChromaDB - Vector database
- Ollama - Local LLM runtime
- Model Context Protocol - Tool integration
Made with ❤️ for the open source community
