MMDPROJECT/datascience-assignments
This repository includes problem set questions for the Data Science course held in Spring 2025 at CS dept. of Shahid Beheshti University.
Data Science Assignments (Spring 2025)
This repository contains all problem sets (psets) for the Data Science course held in Spring 2025 at the Computer Science Department of Shahid Beheshti University, taught by Dr. Saeid Reza Kherad Pisheh. Each problem set includes two main components:
- Question Set (document.pdf): A PDF containing the problem statements.
- Answer Set: Solutions provided in Jupyter Notebook format, alongside a PDF, including the report of the analysis of the notebook. In some cases (e.g., pset0), there is a standalone PDF solution file.
Below is a high-level summary of the repository structure, followed by detailed information for each problem set—including a brief summary of the problem set, and direct links important to files.
Repository Structure
├── pset0
│ ├── document.pdf
│ └── pset0_solution.pdf
│
├── pset1
│ ├── document.pdf
│ ├── Amazon Sales Analysis
│ │ ├── amazon_sales_analysis.ipynb
│ │ └── amazon_sales_analysis.pdf
│ └── Customer Personality Analysis
│ ├── customer_personality_analysis.ipynb
│ └── customer_personality_analysis.pdf
│
├── pset2
│ ├── document.pdf
│ ├── youtube_tranding_videos_analysis.ipynb
│ ├── youtube_tranding_videos_analysis.pdf
| └── Theoretical
└── theoretical.pdf
│
├── pset3
│ ├── document.pdf
│ ├── user_segmentation_brazillian_ecommerce.ipynb
│ ├── user_segmentation_brazillian_ecommerce.pdf
| └── Theoretical
| └── theoretical.pdf
│
├── pset4
│ ├── document.pdf
│ ├── disease_detection.ipynb
│ ├── disease_detection.pdf
| └── Theoretical
| └── theoretical.pdf
|
└── pset5
├── document.pdf
├── insurance_policy_cost_prediction.ipynb
├── insurance_policy_cost_prediction.pdf
└── Theoretical
└── theoretical.pdf
pset0
Summary:
Introductory exercises focusing on data loading, cleaning, summary statistics, and simple visualizations using pandas and matplotlib.
Techniques Applied:
- Writing A Formal Data Analysis Report (Including; Partitioning the Report into Different Sections such as, Abstract, Introduction, Data, Methodology, Conclusion, etc.)
Question Set:
Answer Set:
pset1
Summary:
This set includes two independent analyses:
- Amazon Sales Analysis: Time-series and categorical analysis of Amazon sales data — revenue trends, product/category comparisons, forecasting using regression.
- Customer Personality Analysis: Clustering and personality segmentation using survey and spending data. RFM features, K-means clustering, and PCA-based visualization.
Techniques Applied:
- Exploratory Data Analysis
- Data Preprocessing
- Hypothesis Testing
Question Set:
Amazon Sales Analysis
Answer Set:
Customer Personality Analysis
Answer Set:
pset2
Summary:
Analysis of trending YouTube video data — feature extraction, correlation between engagement metrics, linear regression for view prediction.
Techniques Applied:
- Exploratory Data Analysis
- Data Preprocessing
- Hypothesis Testing
Question Set:
Answer Set:
pset3
Summary:
User segmentation via RFM analysis on Brazilian e-commerce dataset. K-means clustering, dendrogram-based validation, customer lifetime insights.
Techniques Applied:
- Exploratory Data Analysis
- Data Preprocessing
- Feature Engineering
- Clustering (K-means, DBSCAN, PCA, t-SNE, Elbow Curve, etc.)
Question Set:
Answer Set:
pset4
Summary:
Classification model for disease prediction using medical data. Preprocessing, logistic regression or CNN model, metrics evaluation, and ethical discussion.
Techniques Applied:
- Exploratory Data Analysis
- Data Preprocessing
- Classification
- Basic Methods:
- LDA
- Logistic Regression
- Naive Bayes
- SVM
- Ensemble Methods:
- Random Forests
- AdaBoost
- XGBoost
- Voting
- Basic Methods:
Question Set:
Answer Set:
pset5
Summary:
Regression models for insurance policy cost prediction.
Techniques Applied:
- Exploratory Data Analysis
- Data Preprocessing
- Regression
- Random Forests
- AdaBoost
- XGBoost
- Light Boost
- Cat Boost
- Polynomial Regression with Ridge Cost
Question Set:
Answer Set: