GitHunt

Generous Tip Giver Prediction

Problem

Taxi ride-hailing platforms rely heavily on tips as a key component of driver income, yet passenger tipping behavior is highly variable and difficult to predict. This unpredictability limits the platformโ€™s ability to optimize driverโ€“rider matching, incentives, and service quality. Large volumes of trip, fare, temporal, and behavioral data are generated but remain underutilized for tipping prediction. A data science and machine learning approach can identify patterns that distinguish generous tippers from others. Ultimately, this leads to higher service quality, better retention, and increased platform efficiency.

Solution

Built a full ML pipeline including:

  • Data ingestion & cleaning
  • Feature engineering
  • Model training (XGBoost, Random Forest, Logistic Regression)
  • Fast API deployment
  • Dockerized application

๐Ÿ“Š Dataset

  • Type: Yellow Taxi Trip dataset from kaggle
  • Target: Generous Tipper
  • Features: Eighteen Numerical and encoded categorical attributes
  • Size: 22700 Observations

Tech Stack

Python, Pandas, Scikit-learn, XGBoost, FastAPI, Docker

Architecture

generous-tipper/
โ”‚
โ”œโ”€โ”€ data/                 # raw & processed data
โ”œโ”€โ”€ config/               # data & training configurations
โ”œโ”€โ”€ frontend/             # Core frontend logic with dockerization
โ”œโ”€โ”€ notebooks/            # Training and data cleaning notebooks
โ”œโ”€โ”€ src/                  # Core data, training and backend pipeline logic
โ”œโ”€โ”€ tests/                # Basic unit tests of data, training, api pipelines
โ”œโ”€โ”€ docker-compose.yaml   # dockerizing back and frontend with health check every 30 seconds
โ”œโ”€โ”€ Dockerfile            # multi-step dockerization for clean containerization
โ”œโ”€โ”€ pyproject.toml
โ”œโ”€โ”€ README.md
โ””โ”€โ”€ LICENSE

๐Ÿš€ Quick Start

git clone https://github.com/mohsinraza2999/generous-tipper.git
cd house-price-prediction
python src/cli.py preprocess
python src/cli.py train
python src/cli.py route

๐Ÿ”ฎ Making Predictions

python src/cli.py route

For only backend and Swagger UI.

http://localhost:8000/docs

Example response:

{
  "prediction": "generous",
  "processed_at": "10-02-2026T07:30:21S",
  "latency_ms": 0.04
}

๐Ÿงช Testing

Run all unit and integration tests:

pip install pytest
pytest tests/

Tests cover:

  • Data preprocessing pipeline
  • API routes
  • Model inference behavior

๐Ÿงฑ Docker Build

Dockerize back and frontend. Also check health in every 30 seconds.

docker-compose up --build
  1. Run in browser for both front and backend
http://localhost:3000 
  1. For only backend and Swagger UI.
http://localhost:8000/docs

Example response:

{
  "prediction": "generous",
  "processed_at": "10-02-2026T07:30:21S",
  "latency_ms": 0.04
}

๐Ÿ”ง Configuration

  • All hyperparameters stored in YAML files
  • Data paths, training parameters, and inference behavior configurable
  • Environment-agnostic (local or containerized)

๐Ÿง  Design Decisions & Trade-offs

  • Why Dachine Learning?
    Beause tree-based models perform well on tabular data, so neural networks are not chosen to practice model abstraction, extensibility, and deployment workflows.

  • Why config-driven pipelines?
    To separate experimentation from code changes and improve reproducibility.

  • Why both CLI and scripts?
    CLI serves developers; scripts support automation and CI.


Future Improvements

  • Model monitoring & drift detection
  • Cloud deployment

๐Ÿง  Key Learnings

  • ML systems should be designed as maintainable software
  • Testing pipelines prevents silent failures
  • Separation of training and inference is critical

๐Ÿ“œ CI & Automation

  • GitHub Actions pipeline:
    • Runs tests on push
    • Ensures build stability
  • Docker build validation included

๐Ÿ“ฌ Contact

Author: Mohsin Raza
Target Role: Machine Learning Engineer / AI Engineer
GitHub: github/mohsinraza2999
LinkedIn: linkedin/mohsin-raza