GitHunt
AF

AfuaX/user_behavior_pipeline

A Python pipeline that generates synthetic user behavior data, trains a logistic regression model to predict churn, and produces evaluation reports with visualizations.

User Behavior Analytics Pipeline

Overview

This project simulates a user behavior analytics pipeline with synthetic data.
It demonstrates a complete machine learning workflow: data generation, model training, evaluation, and visualization.

Features:

  • Automatically generates a synthetic dataset of user behavior.
  • Trains a logistic regression model to predict churn.
  • Evaluates performance using accuracy and classification report.
  • Saves visualizations: confusion matrix and feature histograms.

Folder Structure

user_behavior_pipeline/
├─ data/                  # Contains dataset
│  └─ user_behavior.csv   # Automatically generated
├─ output/                # Contains plots and reports
│  ├─ classification_report.txt
│  ├─ confusion_matrix.png
│  └─ user_behavior_hist.png
├─ src/                   # Source code
│  ├─ churn_model.py      # Main pipeline
│  └─ generate_data.py    # Generates synthetic dataset
├─ requirements.txt       # Python dependencies
└─ README.md              # Project description and instructions

Setup

  1. Clone or download the project:
git clone <https://github.com/AfuaX/user_behavior_pipeline>
cd user_behavior_pipeline
  1. Create and activate a Python virtual environment:

Windows:

python -m venv venv
.\venv\Scripts\activate

macOS / Linux:

python3 -m venv venv
source venv/bin/activate
  1. Install dependencies:
pip install -r requirements.txt

Running the Project

Generate the dataset:

python src/generate_data.py

This will create data/user_behavior.csv automatically.

Run the main pipeline:

python src/churn_model.py

The script will:

  • Load the dataset
  • Perform train/test split
  • Train a logistic regression model
  • Evaluate model and save classification report
  • Generate plots: confusion matrix & feature histogram

Outputs

📄 View classification report

Confusion Matrix

User Behavior Histogram


Notes

  • The project is fully self-contained; no external dataset is required.
  • You can modify generate_data.py to create more samples or add new features.
  • Plots and reports are saved automatically in the output/ folder.

Dependencies

  • pandas
  • scikit-learn
  • matplotlib
  • seaborn