MON3EMPASHA/Semantic-Movie-Discovery-System
Semantic Movie Discovery System is a full-stack web application that enables users to discover movies using natural language queries powered by semantic search and vector similarity. Built with Next.js, Express.js, MongoDB, and Qdrant, it delivers context-aware movie recommendations based on plot, genre, and title meaning rather than keywords.
π¬ Semantic Movie Discovery System
A full-stack web application that enables users to discover movies using natural language queries powered by semantic search. Built with Next.js, Express.js, MongoDB, and vector databases for intelligent movie recommendations.
β¨ Features
π Semantic Search
- Natural Language Queries: Search movies using plain English descriptions
- Vector Similarity: Find movies based on plot, genre, and title similarity
- Context-Aware Results: Get relevant results even with vague queries
π― Movie Discovery
- Browse Movies: Paginated movie listing with sorting options
- Advanced Filtering: Filter by genres, ratings, release years, directors, and cast
- Movie Details: Comprehensive movie information with posters and trailers
- Similar Movies: Discover movies similar to your favorites
π€ AI-Powered Recommendations
- Enhanced Recommendations: AI-powered movie suggestions
- Genre-Based Suggestions: Find movies in your preferred genres
- Rating-Based Filtering: Get recommendations based on quality ratings
πΌοΈ Image Management
- GridFS Storage: Efficient poster image storage using MongoDB GridFS
- Image Optimization: Automatic image optimization with Sharp
- Poster Backfilling: Automatic download of missing poster images
π¨βπΌ Admin Features
- CRUD Operations: Full movie management interface
- Bulk Operations: Delete multiple movies at once
- Import/Export: Export movies as JSON or CSV
- Analytics Dashboard: View search and storage statistics
- Movie Ingestion: Add movies with automatic embedding generation
ποΈ Architecture
βββββββββββββββββββ
β Frontend β
β (Next.js 16) β
β React 19 β
ββββββββββ¬βββββββββ
β HTTP/REST API
β
ββββββββββΌβββββββββ
β Backend API β
β (Express 5) β
β TypeScript β
ββββββββββ¬βββββββββ
β
ββββββ΄βββββ
β β
βββββΌββββ ββββΌβββββββ
βMongoDBβ β Vector DBβ
β β β(Qdrant/ β
βGridFS β βPinecone) β
βββββββββ ββββββββββββ
π οΈ Technology Stack
Backend
- Runtime: Node.js 18+
- Framework: Express.js 5.1.0
- Language: TypeScript 5.9.3
- Database: MongoDB 8.20.1 (via Mongoose)
- Vector Database: Qdrant / Pinecone (configurable)
- ML/AI:
@xenova/transformers2.17.2 (local embeddings)- Sentence Transformers model:
all-MiniLM-L6-v2(384 dimensions)
- Image Processing: Sharp 0.34.5
- File Upload: Multer 2.0.2
- Validation: Zod 4.1.13
- Logging: Winston 3.19.0
Frontend
- Framework: Next.js 16.0.6
- UI Library: React 19.2.0
- Styling: Tailwind CSS 4
- Language: TypeScript 5
- Fonts: Geist Sans & Geist Mono
π Quick Start
Prerequisites
- Node.js 18 or higher
- MongoDB 6 or higher (local or cloud)
- Qdrant (or Pinecone account)
- npm or yarn
Installation
-
Clone the repository
git clone <repository-url> cd "Adv Db project"
-
Set up the Backend
cd backend npm install -
Configure Backend Environment
Create a
.envfile in thebackenddirectory:# Server Configuration NODE_ENV=development PORT=4000 # MongoDB MONGODB_URI=mongodb://localhost:27017/movies # Vector Database (Qdrant) VECTOR_DB_PROVIDER=qdrant VECTOR_DB_URL=http://localhost:6333 VECTOR_COLLECTION=movies VECTOR_DIMENSION=384 # Embedding Model EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2 # Optional: TMDB API for seeding movies TMDB_API_KEY=your_tmdb_api_key_here
-
Set up the Frontend
cd ../frontend npm install -
Configure Frontend Environment
Create a
.env.localfile in thefrontenddirectory:NEXT_PUBLIC_API_BASE_URL=http://localhost:4000/api
Running the Application
Start MongoDB
Using Docker:
docker run -d -p 27017:27017 --name mongodb mongo:latestOr use your local MongoDB installation
Start Qdrant Vector Database
Using Docker:
docker run -d -p 6333:6333 --name qdrant qdrant/qdrantOr use Qdrant Cloud (update VECTOR_DB_URL and VECTOR_DB_API_KEY in .env)
Start Backend Server
cd backend
npm run devThe backend API will be available at http://localhost:4000
Start Frontend Development Server
cd frontend
npm run devThe frontend application will be available at http://localhost:3000
Seed the Database (Optional)
To populate the database with sample movies from TMDB:
-
Get a TMDB API Key (free):
- Sign up at https://www.themoviedb.org/signup
- Go to Settings β API β Request API Key
- Choose "Developer" option
- Copy your API key
-
Add to
.envfile:TMDB_API_KEY=your_api_key_here
-
Run the seeding script:
cd backend npm run seed:movies
This will fetch ~100 popular movies, download their posters, and generate embeddings for semantic search.
π Usage
Semantic Search
- Navigate to the home page (
http://localhost:3000) - Enter a natural language query, for example:
- "movies about time travel"
- "sci-fi movies with robots"
- "emotional dramas"
- View search results ranked by semantic similarity
Browse Movies
- Visit
/moviesto see all movies with pagination - Use filters to narrow down by genre, rating, year, etc.
- Click on any movie card to view detailed information
Find Similar Movies
- Visit
/find-similar - Search or browse to select a movie
- View the most similar movie based on plot, genre, and title
Admin Panel
- Visit
/adminfor movie management - Create, update, or delete movies
- Export movies as JSON or CSV
- View analytics and statistics
π‘ API Endpoints
Base URL
http://localhost:4000/api
π Project Structure
Adv Db project/
βββ backend/ # Backend API server
β βββ src/
β β βββ app.ts # Express app configuration
β β βββ index.ts # Server entry point
β β βββ config/ # Configuration files
β β βββ controllers/ # Request handlers
β β βββ models/ # Database models (Mongoose)
β β βββ routes/ # API routes
β β βββ services/ # Business logic
β β βββ middleware/ # Express middleware
β β βββ utils/ # Utility functions
β β βββ scripts/ # Utility scripts (seeding, testing)
β βββ package.json
β βββ tsconfig.json
β
βββ frontend/ # Frontend application
βββ src/
β βββ app/ # Next.js app router pages
β βββ components/ # React components
β βββ lib/ # API clients
β βββ types/ # TypeScript types
βββ package.json
βββ tsconfig.json
π§ Configuration
Environment Variables
Backend (backend/.env)
| Variable | Description | Required | Default |
|---|---|---|---|
NODE_ENV |
Environment mode | No | development |
PORT |
Server port | No | 4000 |
MONGODB_URI |
MongoDB connection string | Yes | - |
VECTOR_DB_PROVIDER |
Vector DB provider (qdrant, pinecone, chroma) |
No | qdrant |
VECTOR_DB_URL |
Vector DB URL | Yes (for Qdrant/Chroma) | - |
VECTOR_DB_API_KEY |
Vector DB API key | Yes (for cloud) | - |
VECTOR_COLLECTION |
Vector collection name | No | movies |
VECTOR_DIMENSION |
Vector dimension size | No | 384 |
EMBEDDING_MODEL |
Embedding model name | No | sentence-transformers/all-MiniLM-L6-v2 |
TMDB_API_KEY |
TMDB API key for seeding | No | - |
Frontend (frontend/.env.local)
| Variable | Description | Required |
|---|---|---|
NEXT_PUBLIC_API_BASE_URL |
Backend API base URL | Yes |
π§ͺ Development
Available Scripts
Backend
npm run dev # Start development server with hot reload
npm run build # Build for production
npm start # Start production server
npm run lint # Type check TypeScript
npm run seed:movies # Seed database with TMDB movies
npm run import:movies # Import movies from JSON file
npm run test:embedding # Test embedding generation
npm run test:vectors # Test vector database operationsFrontend
npm run dev # Start development server
npm run build # Build for production
npm start # Start production server
npm run lint # Run ESLintVector Search Workflow
The system generates three types of embeddings for each movie:
- Title embedding - Vector representation of the movie title
- Genre embedding - Vector representation of genres
- Plot embedding - Vector representation of the plot
These vectors are stored in Qdrant for semantic search. For detailed workflow documentation, see QDRANT_VECTOR_SAVING_WORKFLOW.md.
π’ Production Deployment
Backend
-
Build the application:
cd backend npm run build -
Set production environment variables
-
Start the server:
npm start
Frontend
-
Build the application:
cd frontend npm run build -
Start the production server:
npm start
π License
This project is licensed under the MIT License.
π Acknowledgments
- The Movie Database (TMDB) - For movie data and images
- Hugging Face - For the sentence transformer models
- Qdrant - For the vector database solution
- Transformers.js - For local embedding generation
Built with β€οΈ for Advanced Database course