Penguin Species Prediction API
A machine learning API that predicts penguin species (Adelie, Chinstrap or Gentoo) based on body measurements. Built with FastAPI, scikit-learn and the Palmer Penguins dataset.
Project Structure
mlops_lab_03/
├── assets/ # FAST API results and screenshots
├── model/
│ └── penguin_species_model.pkl # Trained model (generated after training)
├── src/
│ ├── data.py # Load and split the Penguins dataset
│ ├── train.py # Train a Random Forest model and save it
│ ├── predict.py # Load the saved model and make predictions
│ └── main.py # FastAPI app with health check and predict endpoints
├── requirements.txt
└── README.md
Dataset
The Palmer Penguins dataset contains body measurements for three penguin species from the Palmer Archipelago in Antarctica. The model uses four features to predict the species:
| Feature | Description |
|---|---|
bill_length_mm |
Length of the penguin's bill (mm) |
bill_depth_mm |
Depth of the penguin's bill (mm) |
flipper_length_mm |
Length of the flipper (mm) |
body_mass_g |
Body mass (grams) |
Target classes:
0— Adelie1— Chinstrap2— Gentoo
Setup
1. Create and activate a virtual environment
python -m venv mlops_lab_envmacOS/Linux:
source mlops_lab_env/bin/activateWindows:
mlops_lab_env\Scripts\activate2. Install dependencies
pip install -r requirements.txtIf you don't have a requirements.txt yet, install the packages manually:
pip install fastapi uvicorn scikit-learn joblib seaborn numpyThen save them:
pip freeze > requirements.txtHow to Run
Step 1: Train the model
From the src/ directory, run the training script. This loads the dataset, trains a Random Forest Classifier and saves the model to model/_species_model.pkl.
cd src
python train.pyYou should see the penguin_species_model.pkl file appear in the model/ directory.
Step 2: Start the API
From the src/ directory, start the FastAPI server:
uvicorn main:app --reloadThe server will start at http://127.0.0.1:8000.
Step 3: Test the API
Health check:
Visit http://127.0.0.1:8000 in your browser or run:
curl http://127.0.0.1:8000Expected response:
{"status": "healthy"}Predict endpoint:
Send a POST request with penguin measurements:
curl -X POST "http://127.0.0.1:8000/predict" \
-H "Content-Type: application/json" \
-d '{"bill_length_mm": 39.1, "bill_depth_mm": 18.7, "flipper_length_mm": 181.0, "body_mass_g": 3750.0}'Expected response:
{"response": 0}Interactive docs:
Visit http://127.0.0.1:8000/docs for the Swagger UI where you can test the endpoints interactively.
Example Predictions
| Species | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | Prediction |
|---|---|---|---|---|---|
| Adelie | 39.1 | 18.7 | 181.0 | 3750.0 | 0 |
| Chinstrap | 46.5 | 17.9 | 192.0 | 3500.0 | 1 |
| Gentoo | 46.1 | 13.2 | 211.0 | 4500.0 | 2 |
Files
data.py— Loads the Penguins dataset via seaborn, drops missing values, encodes species labels as integers and splits the data into training/testing sets (70/30 split).train.py— Trains a Random Forest Classifier (100 trees, max depth 5) on the training data and saves the model using joblib.predict.py— Loads the saved model and returns predictions for new input features.main.py— FastAPI application with two endpoints:GET /— Health check returning{"status": "healthy"}POST /predict— Accepts penguin measurements and returns the predicted species class.