DE
deaneeth/churn-prediction-data-pipeline
Step-by-step EDA and data preprocessing journey for customer churn prediction. Updated weekly with raw & processed datasets, notebooks, and ML-ready pipeline.
π Customer Churn Prediction β EDA & Data Preprocessing Pipeline
A beginner-to-intermediate-friendly project that walks you through the entire journey of preparing data for customer churn prediction β one notebook at a time. π§ͺ
π This repo is updated weekly with:
- Clean, progressive Jupyter notebooks
- Raw & processed datasets
- Practical examples using Python & pandas
- Real-world-style applied EDA for churn modeling
π§ What's Inside?
This project covers the complete preprocessing & EDA pipeline, built step-by-step:
| Notebook | Description |
|---|---|
0_handle_missing_values.ipynb |
Identify & handle missing values using LLM |
1_handle_outliers.ipynb |
Detect & treat outliers |
2_feature_binning.ipynb |
Continuous numerical features are grouped into discrete bins or intervals |
3_feature_encoding.ipynb |
Converting categorical or text data into a numerical format that machine learning models can understand and utilize |
4_feature_scaling.ipynb |
Data preprocessing technique used to standardize or Normalize the range of independent variables or features of data |
π Folder structure:
- π processed/ β cleaned and transformed versions
- π raw/ β raw input dataset
- π Notebooks β each preprocessing step in a separate notebook
π οΈ Tools Used
- Python, Pandas, Pydantic
- groq LLM (for smart imputations)
- OpenAI API
- Matplotlib, Seaborn
- Scikit-learn (soon)
π― Goals
- Learn to preprocess churn data like a pro
- Understand applied EDA, not just charts
- Build a fully cleaned, ML-ready dataset
- Serve as a template for your own ML projects
π Getting Started
To get started with this repo, clone the repository and install the required dependencies:
git clone https://github.com/deaneeth/churn-prediction-data-pipeline.git
cd churn-prediction-data-pipeline
pip install -r requirements.txt
π Why Youβll Like It
- π Step-by-step & easy to follow
- π§ LLM-assisted imputations (cool and practical!)
- π§Ό Realistic focus on data cleaning, not just modeling
- πΎ Includes raw + processed data files with step by step Jupyter Notebooks
π€ Contribute or Follow Along
- This repo is evolving week by week. Star β to stay updated. Fork π΄ to experiment. Contributions & feedback welcome!
Please read the contributing guidelines first.
π Want to learn how data scientists actually clean data before modeling?
Youβre in the right place. Let's build this together.
Created with β€οΈ by deaneeth
On this page
Contributors
Created July 12, 2025
Updated September 18, 2025