Netflix Movies & TV Shows - Recommendation System

A comprehensive data analysis and content-based recommendation system for Netflix Movies and TV Shows using Python. This project performs extensive Exploratory Data Analysis (EDA) and builds an intelligent recommendation engine using machine learning techniques.

Overview
Features
Dataset
Installation
Usage
Project Structure
Technologies Used
Key Findings
Recommendation System
Results
Future Improvements
Contributing
License
Contact

Overview

This project analyzes Netflix's content library to uncover insights about viewing patterns, content distribution, and trends. It includes a sophisticated content-based recommendation system that suggests similar movies and TV shows based on various features like genres, cast, director, and descriptions.

What Makes This Project Special?

Comprehensive EDA: Deep dive into Netflix's content with 20+ visualizations
Advanced Feature Engineering: 14+ engineered features from raw data
Statistical Analysis: Hypothesis testing to validate insights
Smart Recommendations: TF-IDF and cosine similarity-based recommendation engine
Professional Visualizations: Netflix-themed color palette and styling
Interactive Analysis: Ready-to-use Jupyter notebook with detailed explanations

Features

Exploratory Data Analysis
- Content distribution (Movies vs TV Shows)
- Geographic analysis (content by country)
- Temporal trends (release years, addition patterns)
- Genre and rating analysis
- Duration analysis
Statistical Testing
- Release year distribution analysis
- Content type vs rating association
- Geographic content representation
- Temporal trend analysis
Recommendation System
- Content-based filtering
- TF-IDF vectorization (5000 features)
- Cosine similarity matching
- Multi-feature recommendations (genre, cast, director, description)
Professional Visualizations
- Netflix-branded color scheme
- Interactive plots
- Clean, publication-ready figures

Dataset

Source: Netflix Movies and TV Shows Dataset on Kaggle

Details:

8,807 titles (Movies and TV Shows)
12 features: show_id, type, title, director, cast, country, date_added, release_year, rating, duration, listed_in, description
Date Range: Content added from 2008 to 2021
Coverage: Global content from 100+ countries

Installation

Prerequisites

Python 3.8 or higher
Jupyter Notebook or JupyterLab
Kaggle account (for dataset download)

Setup

Clone the repository

git clone https://github.com/NusratBegum/Netflix-Recommendation-System-in-Python.git
cd Netflix-Recommendation-System-in-Python

Install required packages

pip install -r requirements.txt

Or install packages individually:

pip install numpy pandas matplotlib seaborn scikit-learn scipy kagglehub jupyter

Configure Kaggle credentials (if needed)

# Place your kaggle.json in ~/.kaggle/
mkdir -p ~/.kaggle
cp /path/to/kaggle.json ~/.kaggle/
chmod 600 ~/.kaggle/kaggle.json

Launch Jupyter Notebook
```
jupyter notebook main.ipynb
```

Usage

Quick Start

Open main.ipynb and run all cells to:

Download and load the Netflix dataset
Perform comprehensive EDA
Build the recommendation system
Get personalized recommendations

Get Recommendations

# Use the recommendation function
recommend_netflix('Stranger Things')

# Or use the detailed function
recommendations = get_recommendations('Breaking Bad', top_n=10)

Example Output

NETFLIX RECOMMENDATIONS FOR: 'STRANGER THINGS'
================================================

Top 10 Similar Titles:

1. Nightflyers
   Type: TV Show | Genre: TV Horror, TV Mysteries, TV Sci-Fi & Fantasy
   Rating: TV-MA | Year: 2018 | Match: 56.2%

2. Helix
   Type: TV Show | Genre: TV Horror, TV Mysteries, TV Sci-Fi & Fantasy
   Rating: TV-MA | Year: 2015 | Match: 55.3%
...

Project Structure

Netflix-Recommendation-System-in-Python/
│
├── main.ipynb              # Main Jupyter notebook with complete analysis
├── README.md               # Project documentation (this file)
└── .gitignore             # Git ignore file

Notebook Sections

Data Loading & Initial Exploration
- Library imports
- Dataset loading
- Initial data inspection
Feature Types Analysis
- Data types examination
- Feature categorization
Data Cleaning & Preprocessing
- Missing value handling
- Data type conversions
- Text preprocessing
Feature Engineering
- Date feature extraction
- Duration parsing
- Genre and country splitting
Exploratory Data Analysis
- Distribution analysis
- Temporal trends
- Geographic insights
- Genre analysis
Hypothesis Testing
- Statistical tests
- Trend validation
Content-Based Recommendation System
- TF-IDF vectorization
- Similarity computation
- Recommendation function
Conclusions & Insights
- Key findings
- Summary report

Technologies Used

Python 3.8+: Core programming language
Pandas: Data manipulation and analysis
NumPy: Numerical computing
Matplotlib & Seaborn: Data visualization
Scikit-learn: Machine learning and NLP
- TfidfVectorizer: Text feature extraction
- Cosine Similarity: Content matching
SciPy: Statistical analysis
Kagglehub: Dataset management
Jupyter: Interactive development environment

Key Findings

Content Distribution

Movies: 6,131 titles (69.6%)
TV Shows: 2,676 titles (30.4%)
Netflix has been increasingly adding TV Shows in recent years

Geographic Insights

Top Producer: United States (3,211 titles)
Other Major Producers: India, United Kingdom
Content from 100+ countries represented

Content Characteristics

Most Common Rating: TV-MA (Mature Audiences)
Top Genres: International Movies, Dramas, Comedies
Average Movie Duration: ~100 minutes
Average TV Show Seasons: 1.8 seasons

Temporal Patterns

Peak content addition: End of year (Q4)
Most additions on Fridays
Release years span 1925-2021

Recommendation System

How It Works

The recommendation system uses Content-Based Filtering:

Feature Extraction: Combines multiple text features
- Genres (listed_in)
- Cast members
- Director
- Country
- Rating
- Description
Text Vectorization: TF-IDF (Term Frequency-Inverse Document Frequency)
- 5,000 feature matrix
- Captures content uniqueness
Similarity Computation: Cosine Similarity
- Measures content similarity (0-1 scale)
- Higher scores = more similar content
Recommendation Generation
- Ranks all titles by similarity
- Returns top N matches

Performance

Matrix Size: 8,807 titles × 5,000 features
Computation Time: < 1 second per recommendation
Accuracy: High relevance based on content features

Results

The project successfully:

Analyzed 8,807 Netflix titles with comprehensive visualizations
Identified key trends in Netflix's content strategy
Validated hypotheses using statistical tests
Built a functional recommendation system
Generated relevant recommendations for various content types

Sample Recommendations

For "Stranger Things" (Sci-Fi TV Show):

Nightflyers (56.2% match)
Helix (55.3% match)
Chilling Adventures of Sabrina (54.4% match)

For "Breaking Bad" (Crime Drama):

Dare Me (51.1% match)
The Lizzie Borden Chronicles (49.9% match)
Ozark (47.5% match)

Future Improvements

Implement collaborative filtering using user data
Add sentiment analysis on descriptions
Create hybrid recommendation system
Build interactive web application (Flask/Streamlit)
Include user ratings and reviews
Add real-time trending analysis
Implement deep learning models (neural networks)
Create API for recommendations

Contributing

Contributions are welcome! Here's how you can help:

Fork the repository
Create a feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

Nusrat Begum

GitHub: @NusratBegum
Project Link: https://github.com/NusratBegum/Netflix-Recommendation-System-in-Python

Acknowledgments

Dataset provided by Shivam Bansal on Kaggle
Inspired by Netflix's recommendation algorithms
Thanks to the open-source community

If you found this project helpful, please consider giving it a star!

Last Updated: December 2025

NusratBegum/Netflix-Recommendation-System