mehrdad-dev/BookLM
BookLM - A sophisticated book recommendation system that combines the power of AI, vector similarity search, and natural language processing to provide personalized book recommendations.
BookLM - Intelligent Book Recommendation System
A sophisticated book recommendation system that combines the power of AI, vector similarity search, and natural language processing to provide personalized book recommendations.
๐ Features
- AI-Powered Recommendations: Uses OpenAI's language models to provide intelligent, contextual book recommendations
- Semantic Search: Leverages HuggingFace embeddings and ChromaDB for similarity-based book discovery
- Book Comparison: Compare two books with AI-generated insights
- Fast API: Built with FastAPI for high-performance API endpoints
- Vector Database: ChromaDB for efficient similarity search and retrieval
BookLM.mp4
๐๏ธ Architecture
The system consists of several key components:
- Data Layer: SQLite database storing dataset
- Vector Database: ChromaDB for semantic similarity search
- AI Layer: OpenAI LLM for intelligent recommendations
- API Layer: FastAPI serving REST endpoints
- Frontend: Static HTML/CSS/JS interface
Tools Used:
- OpenAI for providing the language models
- HuggingFace for embedding models
- ChromaDB for vector database functionality
- FastAPI for the web framework
- LangChain for AI/ML orchestration
๐ Prerequisites
- Python 3.8 or higher
- OpenAI API key
- Sufficient disk space for book embeddings
๐ ๏ธ Installation
1. Clone the Repository
git clone https://github.com/mehrdad-dev/BookLM.git
cd BookLM2. Install Dependencies
pip install -r requirements.txt3. Environment Setup
Create a .env file in the project root with the following variables:
# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_API_BASE=your_openai_base_url_here (if needed)
LLM_MODEL=gemma-3-1b-it
# Embedding Model
EMBEDDING_MODEL_NAME=sentence-transformers/all-MiniLM-L6-v2
# Database Configuration
CSV_PATH=dataset/Best_books_ever[Cleaned].csv
DB_PATH=books.db
ROWS_LIMIT=100
# Vector Database
INDEX_PATH=chroma_books_index4. Data Preparation
The original dataset I used for this project:
https://github.com/scostap/goodreads_bbe_dataset
You can find a cleaned version of this dataset in the dataset/ folder.
Ensure you have the book dataset in the dataset/ folder. The system expects a CSV file with the following columns:
bookId: Unique book identifiertitle: Book titleauthor: Book authorrating: Book ratingdescription: Book descriptiongenres: Book genrescharacters: Book characterscoverImg: Book cover image URL
๐ Running the Application
Start the Server
uvicorn main:app --reloadFirst Run
On the first run, the system will:
- Load book data from CSV into SQLite database
- Create embeddings for book descriptions using HuggingFace
- Store embeddings in ChromaDB for similarity search
- Start the web server
This process may take a few minutes depending on the dataset size.
๐ Usage
Web Interface
-
Book Recommendations:
- Navigate to the "Recommendation" tab
- Enter your book preferences (e.g., "I want a fantasy book about magical worlds")
- Get AI-powered recommendations with reasoning
-
Book Comparison:
- Navigate to the "Compare" tab
- Search book titles
- Select two books
- Get AI-generated comparison insights
๐๏ธ Project Structure
BookLM/
โโโ main.py
โโโ requirements.txt
โโโ README.md
โโโ books.db # SQLite database (auto-generated)
โโโ chroma_books_index/ # ChromaDB vector database (auto-generated)
โโโ books_1.Best_Books_Ever.csv
โโโ dataset/
โ โโโ Best_books_ever[Cleaned].csv
โ โโโ dataset.ipynb
โโโ static/
โโโ index.html
๐ง Configuration
Environment Variables
OPENAI_API_KEY: Your OpenAI API keyOPENAI_API_BASE: Your OpenAI Base URLLLM_MODEL: OpenAI model to use (I used: gemma-3-1b-it)EMBEDDING_MODEL_NAME: HuggingFace embedding modelCSV_PATH: Path to your book dataset CSVDB_PATH: SQLite database file pathROWS_LIMIT: Number of books to process (for testing)INDEX_PATH: ChromaDB index directory
Performance Tuning
- ROWS_LIMIT: Reduce for faster initial setup, increase for more comprehensive recommendations
- Chunk Size: Modify
chunk_sizeinprepare_documents()for different embedding granularity - Similarity Search: Adjust
kparameter inquery_db()for more/fewer recommendations
๐ค Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
๐ License
This project is licensed under the MIT License
Happy Reading! ๐โจ