Penguin Species Prediction API

A machine learning API that predicts penguin species (Adelie, Chinstrap or Gentoo) based on body measurements. Built with FastAPI, scikit-learn and the Palmer Penguins dataset.

Project Structure

mlops_lab_03/
├── assets/                    # FAST API results and screenshots
├── model/
│   └── penguin_species_model.pkl      # Trained model (generated after training)
├── src/
│   ├── data.py                # Load and split the Penguins dataset
│   ├── train.py               # Train a Random Forest model and save it
│   ├── predict.py             # Load the saved model and make predictions
│   └── main.py                # FastAPI app with health check and predict endpoints
├── requirements.txt
└── README.md

Dataset

The Palmer Penguins dataset contains body measurements for three penguin species from the Palmer Archipelago in Antarctica. The model uses four features to predict the species:

Feature	Description
`bill_length_mm`	Length of the penguin's bill (mm)
`bill_depth_mm`	Depth of the penguin's bill (mm)
`flipper_length_mm`	Length of the flipper (mm)
`body_mass_g`	Body mass (grams)

Target classes:

0 — Adelie
1 — Chinstrap
2 — Gentoo

Setup

1. Create and activate a virtual environment

python -m venv mlops_lab_env

macOS/Linux:

source mlops_lab_env/bin/activate

Windows:

mlops_lab_env\Scripts\activate

2. Install dependencies

pip install -r requirements.txt

If you don't have a requirements.txt yet, install the packages manually:

pip install fastapi uvicorn scikit-learn joblib seaborn numpy

Then save them:

pip freeze > requirements.txt

How to Run

Step 1: Train the model

From the src/ directory, run the training script. This loads the dataset, trains a Random Forest Classifier and saves the model to model/_species_model.pkl.

cd src
python train.py

You should see the penguin_species_model.pkl file appear in the model/ directory.

Step 2: Start the API

From the src/ directory, start the FastAPI server:

uvicorn main:app --reload

The server will start at http://127.0.0.1:8000.

Step 3: Test the API

Health check:

Visit http://127.0.0.1:8000 in your browser or run:

curl http://127.0.0.1:8000

Expected response:

{"status": "healthy"}

Predict endpoint:

Send a POST request with penguin measurements:

curl -X POST "http://127.0.0.1:8000/predict" \
  -H "Content-Type: application/json" \
  -d '{"bill_length_mm": 39.1, "bill_depth_mm": 18.7, "flipper_length_mm": 181.0, "body_mass_g": 3750.0}'

Expected response:

{"response": 0}

Interactive docs:

Visit http://127.0.0.1:8000/docs for the Swagger UI where you can test the endpoints interactively.

Example Predictions

Species	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	Prediction
Adelie	39.1	18.7	181.0	3750.0	0
Chinstrap	46.5	17.9	192.0	3500.0	1
Gentoo	46.1	13.2	211.0	4500.0	2

Files

data.py — Loads the Penguins dataset via seaborn, drops missing values, encodes species labels as integers and splits the data into training/testing sets (70/30 split).
train.py — Trains a Random Forest Classifier (100 trees, max depth 5) on the training data and saves the model using joblib.
predict.py — Loads the saved model and returns predictions for new input features.
main.py — FastAPI application with two endpoints:
- GET / — Health check returning {"status": "healthy"}
- POST /predict — Accepts penguin measurements and returns the predicted species class.

GouriRajesh/mlops_lab_03