AF
AfuaX/user_behavior_pipeline
A Python pipeline that generates synthetic user behavior data, trains a logistic regression model to predict churn, and produces evaluation reports with visualizations.
User Behavior Analytics Pipeline
Overview
This project simulates a user behavior analytics pipeline with synthetic data.
It demonstrates a complete machine learning workflow: data generation, model training, evaluation, and visualization.
Features:
- Automatically generates a synthetic dataset of user behavior.
- Trains a logistic regression model to predict churn.
- Evaluates performance using accuracy and classification report.
- Saves visualizations: confusion matrix and feature histograms.
Folder Structure
user_behavior_pipeline/
├─ data/ # Contains dataset
│ └─ user_behavior.csv # Automatically generated
├─ output/ # Contains plots and reports
│ ├─ classification_report.txt
│ ├─ confusion_matrix.png
│ └─ user_behavior_hist.png
├─ src/ # Source code
│ ├─ churn_model.py # Main pipeline
│ └─ generate_data.py # Generates synthetic dataset
├─ requirements.txt # Python dependencies
└─ README.md # Project description and instructions
Setup
- Clone or download the project:
git clone <https://github.com/AfuaX/user_behavior_pipeline>
cd user_behavior_pipeline- Create and activate a Python virtual environment:
Windows:
python -m venv venv
.\venv\Scripts\activatemacOS / Linux:
python3 -m venv venv
source venv/bin/activate- Install dependencies:
pip install -r requirements.txtRunning the Project
Generate the dataset:
python src/generate_data.pyThis will create data/user_behavior.csv automatically.
Run the main pipeline:
python src/churn_model.pyThe script will:
- Load the dataset
- Perform train/test split
- Train a logistic regression model
- Evaluate model and save classification report
- Generate plots: confusion matrix & feature histogram
Outputs
Notes
- The project is fully self-contained; no external dataset is required.
- You can modify
generate_data.pyto create more samples or add new features. - Plots and reports are saved automatically in the
output/folder.
Dependencies
- pandas
- scikit-learn
- matplotlib
- seaborn

