GitHunt
HI

higorfct/MLOps-with-Naive-Bayes

Projeto de MLops com MLflow

๐Ÿš€ MLOps Project with Gaussian Naive Bayes

This repository presents a complete MLOps (Machine Learning Operations) pipeline, covering everything from data preparation to deployment and monitoring of a machine learning model in production.


๐Ÿš€ How to use

  1. Clone the repository and install dependencies:
    git clone https://github.com/higorfct/MLOps-with-Naive-Bayes/tree/main
    cd MLOps-with-Naive-Bayes
    pip install -r requirements.txt

๐Ÿ“Œ Objective

Automate and scale the lifecycle of a Gaussian Naive Bayes (GNB) model with hyperparameters using modern MLOps practices, such as model versioning, experiment tracking, deployment via API, and reproducibility of results.

We apply this MLOps pipeline to the Credit.csv dataset from a German bank to predict customers who are good or bad payers.


๐Ÿค” Why Gaussian Naive Bayes?

GNB was chosen because it is simple, fast, and effective for classification tasks with approximately continuous variables. It serves as a strong baseline model for problems with data that follow (or approximate) a normal distribution.

Additionally:

  • Low computational cost
  • Easy interpretability
  • Suitable for initial pipeline deployment

โš™๏ธ Technologies and Tools Used

  • Python
  • MLflow โ†’ Experiment tracking, model registry, deployment
  • MLflow.sklearn โ†’ Integration between Scikit-learn and MLflow
  • Scikit-learn โ†’ Training, evaluation (accuracy, F1, AUC), splitting data
  • Pandas โ†’ Data manipulation and analysis
  • NumPy โ†’ Numerical operations
  • Matplotlib โ†’ Data visualization (ROC, confusion matrix)
  • Seaborn โ†’ Advanced statistical visualizations (heatmaps)
  • Requests โ†’ HTTP requests for API consumption

These technologies cover the entire ML pipeline: data prep โ†’ training โ†’ evaluation โ†’ visualization โ†’ deployment โ†’ monitoring.


๐Ÿง  MLOps Pipeline Steps

  1. ๐Ÿ“Š Data Analysis and Preparation

    • Data cleaning and transformation
    • Train/test split
  2. ๐Ÿ—๏ธ Model Training

    • Gaussian Naive Bayes with var_smoothing
    • Logging experiments in MLflow (parameters, metrics, artifacts)
  3. ๐Ÿ“ฆ Versioning and Registration

    • Model registered in MLflow Registry
  4. ๐Ÿš€ Model Deployment

    • Deployed locally as an MLflow service with HTTP endpoint for REST predictions
  5. ๐Ÿ“ˆ Monitoring and Re-evaluation

    • Periodic revalidation with new data
    • Production metrics logging

โœ… Results

The model achieved the following results on the test dataset:

Metric Value
Accuracy 0.6967
AUC 0.6600
F1 Score 0.7719
Log Loss 10.9332
Precision 0.7739
Recall 0.7700

Interpretation:

  • Accuracy (~69.7%) โ†’ ~7 out of 10 predictions correct
  • AUC (0.66) โ†’ moderate class discrimination ability
  • F1 Score (0.77) โ†’ good balance between precision & recall
  • Precision (0.77) โ†’ 77% of positive predictions were correct
  • Recall (0.77) โ†’ 77% of actual positive cases detected
  • Log Loss (10.93) โ†’ high value, indicating probability estimates could be improved

๐Ÿ”Ž Conclusions

  • The Gaussian Naive Bayes algorithm proved to be a solid baseline model for the credit classification problem, reaching balanced performance in precision and recall.
  • However, the AUC and Log Loss suggest that the model still struggles to provide highly reliable probability estimates.
  • For future iterations, improvements could include:
    • Applying cross-validation for more robust training.
    • Testing ensemble methods (Random Forest, Gradient Boosting) or regularized models (Logistic Regression with penalty).
  • From an MLOps perspective, the project successfully demonstrates:
    • Full lifecycle automation
    • Experiment tracking and versioning with MLflow
    • Reproducibility and deployment of models into production-like environments

๐Ÿ‘‰ This pipeline provides a scalable and reproducible foundation for deploying and monitoring ML models in real-world financial applications.

higorfct/MLOps-with-Naive-Bayes | GitHunt