GitHunt
GO

GouriRajesh/mlops_lab_03

Penguin species classification API built with FastAPI and scikit-learn.

Penguin Species Prediction API

A machine learning API that predicts penguin species (Adelie, Chinstrap or Gentoo) based on body measurements. Built with FastAPI, scikit-learn and the Palmer Penguins dataset.

Project Structure

mlops_lab_03/
├── assets/                    # FAST API results and screenshots
├── model/
│   └── penguin_species_model.pkl      # Trained model (generated after training)
├── src/
│   ├── data.py                # Load and split the Penguins dataset
│   ├── train.py               # Train a Random Forest model and save it
│   ├── predict.py             # Load the saved model and make predictions
│   └── main.py                # FastAPI app with health check and predict endpoints
├── requirements.txt
└── README.md

Dataset

The Palmer Penguins dataset contains body measurements for three penguin species from the Palmer Archipelago in Antarctica. The model uses four features to predict the species:

Feature Description
bill_length_mm Length of the penguin's bill (mm)
bill_depth_mm Depth of the penguin's bill (mm)
flipper_length_mm Length of the flipper (mm)
body_mass_g Body mass (grams)

Target classes:

  • 0 — Adelie
  • 1 — Chinstrap
  • 2 — Gentoo

Setup

1. Create and activate a virtual environment

python -m venv mlops_lab_env

macOS/Linux:

source mlops_lab_env/bin/activate

Windows:

mlops_lab_env\Scripts\activate

2. Install dependencies

pip install -r requirements.txt

If you don't have a requirements.txt yet, install the packages manually:

pip install fastapi uvicorn scikit-learn joblib seaborn numpy

Then save them:

pip freeze > requirements.txt

How to Run

Step 1: Train the model

From the src/ directory, run the training script. This loads the dataset, trains a Random Forest Classifier and saves the model to model/_species_model.pkl.

cd src
python train.py

You should see the penguin_species_model.pkl file appear in the model/ directory.

Step 2: Start the API

From the src/ directory, start the FastAPI server:

uvicorn main:app --reload

The server will start at http://127.0.0.1:8000.

Step 3: Test the API

Health check:

Visit http://127.0.0.1:8000 in your browser or run:

curl http://127.0.0.1:8000

Expected response:

{"status": "healthy"}

Predict endpoint:

Send a POST request with penguin measurements:

curl -X POST "http://127.0.0.1:8000/predict" \
  -H "Content-Type: application/json" \
  -d '{"bill_length_mm": 39.1, "bill_depth_mm": 18.7, "flipper_length_mm": 181.0, "body_mass_g": 3750.0}'

Expected response:

{"response": 0}

Interactive docs:

Visit http://127.0.0.1:8000/docs for the Swagger UI where you can test the endpoints interactively.

Example Predictions

Species bill_length_mm bill_depth_mm flipper_length_mm body_mass_g Prediction
Adelie 39.1 18.7 181.0 3750.0 0
Chinstrap 46.5 17.9 192.0 3500.0 1
Gentoo 46.1 13.2 211.0 4500.0 2

Files

  • data.py — Loads the Penguins dataset via seaborn, drops missing values, encodes species labels as integers and splits the data into training/testing sets (70/30 split).
  • train.py — Trains a Random Forest Classifier (100 trees, max depth 5) on the training data and saves the model using joblib.
  • predict.py — Loads the saved model and returns predictions for new input features.
  • main.py — FastAPI application with two endpoints:
    • GET / — Health check returning {"status": "healthy"}
    • POST /predict — Accepts penguin measurements and returns the predicted species class.