365 Q&A Chatbot

A Streamlit application that enables users to upload a PDF document and chat with an AI assistant that answers questions using only the content from that PDF. This project implements a Retrieval-Augmented Generation (RAG) pipeline using LangChain and ChromaDB.

Features

PDF Upload: Upload any PDF document to build a knowledge base
Multi-Provider Support: Choose between OpenAI and OpenRouter for LLM access
Multiple Models: Access to various models including:
- OpenAI: GPT-4o, GPT-4o-mini, GPT-3.5-turbo
- OpenRouter: Google Gemini 2.5 Flash, Llama 3.2, Mixtral 8x7B, and more
RAG Pipeline: Uses LangChain with OpenAI embeddings and ChromaDB vector store
Interactive Chat: Ask questions about your PDF content with a conversational interface
Context-Aware: AI responses are based solely on the uploaded PDF content
Streaming Responses: Real-time response streaming for better user experience
Dynamic Configuration: Automatically configures API endpoints and models based on provider selection

Tech Stack

Frontend: Streamlit
LLM Providers: OpenAI, OpenRouter (supporting Google Gemini, Llama, Mixtral, etc.)
Embeddings: OpenAI text-embedding-3-small
Vector Store: ChromaDB
Document Processing: LangChain, PyPDF

Prerequisites

⚠️ Important: You need a PDF file and at least one API key (OpenAI or OpenRouter) to use this application.

Required Environment Variables

OPENAI_API_KEY: Your OpenAI API key (for direct OpenAI models)
OPENROUTER_API_KEY: Your OpenRouter API key (for multi-model access including Google Gemini, Llama, Mixtral, etc.)

Note: You only need one API key depending on which provider you choose to use.

Setup Instructions

1. Clone the Repository

git clone <your-repository-url>
cd 365-QnA-Chatbot

2. Install Dependencies

pip install -r requirements.txt

3. Configure Secrets

Create a .streamlit/secrets.toml file in your project root:

# OpenAI API Key (for direct OpenAI models)
OPENAI_API_KEY = "sk-your-openai-api-key-here"

# OpenRouter API Key (for multi-model access)
OPENROUTER_API_KEY = "sk-your-openrouter-api-key-here"

Note: Copy the .streamlit/secrets.toml.example file and replace the placeholders with your actual API keys. You only need one API key depending on which provider you choose to use.

4. Test Your Installation (Optional but Recommended)

Before running the app, test that all imports work correctly:

python test_imports.py

This will verify that all required packages are properly installed.

5. Run the Application

streamlit run streamlit_app.py

The application will open in your default web browser at http://localhost:8501.

Usage

Select Provider: Choose between OpenAI or OpenRouter in the sidebar
Enter API Key: Provide your API key for the selected provider (or configure in secrets.toml)
Select Model: Choose from available models for your selected provider
Upload PDF: Use the file uploader to select a PDF document
Process PDF: Click "Process PDF" to build the knowledge base
Start Chatting: Once processing is complete, ask questions about your PDF content
View History: Your conversation history is maintained throughout the session

How It Works

Document Loading: PDF is loaded using PyPDFLoader from LangChain
Text Splitting: Documents are split into chunks using TokenTextSplitter (1000 tokens per chunk, 100 token overlap)
Embedding Creation: Text chunks are converted to embeddings using OpenAI's text-embedding-3-small model
Vector Store: Embeddings are stored in ChromaDB for efficient similarity search
Provider Configuration: Based on your selection, the app configures the appropriate API endpoint and model
Retrieval: When you ask a question, relevant chunks are retrieved from the vector store
Generation: Retrieved context is passed to your selected model (GPT, Gemini, Llama, Mixtral, etc.) to generate responses

Deployment

Streamlit Community Cloud

Fork this repository
Go to Streamlit Community Cloud
Connect your GitHub account and select your forked repository
Add your API keys (OPENAI_API_KEY and/or OPENROUTER_API_KEY) in the secrets section
Deploy!

Local Deployment

For production deployment, consider using:

Docker containers
Cloud platforms (AWS, GCP, Azure)
VPS with proper security configurations

File Structure

365-QnA-Chatbot/
├── streamlit_app.py               # Main Streamlit application
├── test_imports.py                # Import testing script
├── requirements.txt                # Python dependencies
├── README.md                      # This file
├── .gitignore                     # Git ignore rules
├── .streamlit/
│   └── secrets.toml.example       # Secrets template
└── temp_vectorstore/              # Generated vector store (auto-created)

Troubleshooting

Common Issues

Import Errors: If you get ModuleNotFoundError during deployment:
- Run python test_imports.py locally to test imports
- Check that your requirements.txt has the correct package versions
- Ensure all LangChain packages are properly installed
API Key Error: Ensure your API key is correctly set in .streamlit/secrets.toml or entered in the sidebar
Provider Selection: Make sure you've selected the correct provider (OpenAI or OpenRouter) that matches your API key
Model Availability: Some models may not be available on OpenRouter; try a different model if you encounter errors
PDF Processing Error: Make sure the uploaded file is a valid PDF
Memory Issues: Large PDFs may require more memory; consider reducing chunk size
Rate Limiting: API providers have rate limits; consider upgrading your plan for heavy usage

Performance Tips

Use smaller chunk sizes for faster processing
Consider using GPU-accelerated embeddings for large documents
Implement caching for frequently accessed documents

Contributing

Fork the repository
Create a feature branch
Make your changes
Test thoroughly
Submit a pull request

License

This project is open source and available under the MIT License.

Support

If you encounter any issues or have questions, please:

Check the troubleshooting section above
Search existing GitHub issues
Create a new issue with detailed information about your problem

Happy Chatting! 🤖📚

MrSpecks/365-QnA-Chatbot