📊 Customer Churn Prediction – EDA & Data Preprocessing Pipeline

A beginner-to-intermediate-friendly project that walks you through the entire journey of preparing data for customer churn prediction — one notebook at a time. 🧪

🚀 This repo is updated weekly with:

Clean, progressive Jupyter notebooks
Raw & processed datasets
Practical examples using Python & pandas
Real-world-style applied EDA for churn modeling

🧭 What's Inside?

This project covers the complete preprocessing & EDA pipeline, built step-by-step:

Notebook	Description
`0_handle_missing_values.ipynb`	Identify & handle missing values using LLM
`1_handle_outliers.ipynb`	Detect & treat outliers
`2_feature_binning.ipynb`	Continuous numerical features are grouped into discrete bins or intervals
`3_feature_encoding.ipynb`	Converting categorical or text data into a numerical format that machine learning models can understand and utilize
`4_feature_scaling.ipynb`	Data preprocessing technique used to standardize or Normalize the range of independent variables or features of data

📁 Folder structure:

- 📂 processed/ → cleaned and transformed versions
- 📂 raw/ → raw input dataset
- 📓 Notebooks → each preprocessing step in a separate notebook

🛠️ Tools Used

Python, Pandas, Pydantic
groq LLM (for smart imputations)
OpenAI API
Matplotlib, Seaborn
Scikit-learn (soon)

🎯 Goals

Learn to preprocess churn data like a pro
Understand applied EDA, not just charts
Build a fully cleaned, ML-ready dataset
Serve as a template for your own ML projects

🚀 Getting Started

To get started with this repo, clone the repository and install the required dependencies:

git clone https://github.com/deaneeth/churn-prediction-data-pipeline.git
cd churn-prediction-data-pipeline
pip install -r requirements.txt

🌟 Why You’ll Like It

📚 Step-by-step & easy to follow
🧠 LLM-assisted imputations (cool and practical!)
🧼 Realistic focus on data cleaning, not just modeling
💾 Includes raw + processed data files with step by step Jupyter Notebooks

🤝 Contribute or Follow Along

This repo is evolving week by week. Star ⭐ to stay updated. Fork 🍴 to experiment. Contributions & feedback welcome!
Please read the contributing guidelines first.

👀 Want to learn how data scientists actually clean data before modeling?

You’re in the right place. Let's build this together.

Created with ❤️ by deaneeth

deaneeth/churn-prediction-data-pipeline