mohsinraza2999/generous-tipper
A production level modular data science project aims to predict generous tippers for taxi drivers.
Generous Tip Giver Prediction
Problem
Taxi ride-hailing platforms rely heavily on tips as a key component of driver income, yet passenger tipping behavior is highly variable and difficult to predict. This unpredictability limits the platformโs ability to optimize driverโrider matching, incentives, and service quality. Large volumes of trip, fare, temporal, and behavioral data are generated but remain underutilized for tipping prediction. A data science and machine learning approach can identify patterns that distinguish generous tippers from others. Ultimately, this leads to higher service quality, better retention, and increased platform efficiency.
Solution
Built a full ML pipeline including:
- Data ingestion & cleaning
- Feature engineering
- Model training (XGBoost, Random Forest, Logistic Regression)
- Fast API deployment
- Dockerized application
๐ Dataset
- Type: Yellow Taxi Trip dataset from kaggle
- Target: Generous Tipper
- Features: Eighteen Numerical and encoded categorical attributes
- Size: 22700 Observations
Tech Stack
Python, Pandas, Scikit-learn, XGBoost, FastAPI, Docker
Architecture
generous-tipper/
โ
โโโ data/ # raw & processed data
โโโ config/ # data & training configurations
โโโ frontend/ # Core frontend logic with dockerization
โโโ notebooks/ # Training and data cleaning notebooks
โโโ src/ # Core data, training and backend pipeline logic
โโโ tests/ # Basic unit tests of data, training, api pipelines
โโโ docker-compose.yaml # dockerizing back and frontend with health check every 30 seconds
โโโ Dockerfile # multi-step dockerization for clean containerization
โโโ pyproject.toml
โโโ README.md
โโโ LICENSE
๐ Quick Start
git clone https://github.com/mohsinraza2999/generous-tipper.git
cd house-price-prediction
python src/cli.py preprocess
python src/cli.py train
python src/cli.py route๐ฎ Making Predictions
python src/cli.py routeFor only backend and Swagger UI.
http://localhost:8000/docs
Example response:
{
"prediction": "generous",
"processed_at": "10-02-2026T07:30:21S",
"latency_ms": 0.04
}๐งช Testing
Run all unit and integration tests:
pip install pytest
pytest tests/Tests cover:
- Data preprocessing pipeline
- API routes
- Model inference behavior
๐งฑ Docker Build
Dockerize back and frontend. Also check health in every 30 seconds.
docker-compose up --build- Run in browser for both front and backend
http://localhost:3000
- For only backend and Swagger UI.
http://localhost:8000/docs
Example response:
{
"prediction": "generous",
"processed_at": "10-02-2026T07:30:21S",
"latency_ms": 0.04
}๐ง Configuration
- All hyperparameters stored in YAML files
- Data paths, training parameters, and inference behavior configurable
- Environment-agnostic (local or containerized)
๐ง Design Decisions & Trade-offs
-
Why Dachine Learning?
Beause tree-based models perform well on tabular data, so neural networks are not chosen to practice model abstraction, extensibility, and deployment workflows. -
Why config-driven pipelines?
To separate experimentation from code changes and improve reproducibility. -
Why both CLI and scripts?
CLI serves developers; scripts support automation and CI.
Future Improvements
- Model monitoring & drift detection
- Cloud deployment
๐ง Key Learnings
- ML systems should be designed as maintainable software
- Testing pipelines prevents silent failures
- Separation of training and inference is critical
๐ CI & Automation
- GitHub Actions pipeline:
- Runs tests on push
- Ensures build stability
- Docker build validation included
๐ฌ Contact
Author: Mohsin Raza
Target Role: Machine Learning Engineer / AI Engineer
GitHub: github/mohsinraza2999
LinkedIn: linkedin/mohsin-raza