GitHunt
GO

GouriRajesh/mlops_lab_05

Heart Disease Prediction with Docker

CardioScan: Heart Disease Prediction with Docker

A containerized ML application that trains a neural network on the UCI Cleveland Heart Disease dataset and serves predictions through a Flask web interface. The project demonstrates multi-stage Docker builds and Docker Compose for separating model training from model serving.

Project Overview

This project follows a two-stage containerized ML workflow:

  • Stage 1 — Model Training: Downloads the Heart Disease dataset, preprocesses it, trains a TensorFlow/Keras binary classifier and saves both the trained model (my_model.keras) and the scaler parameters (scaler_params.npz).
  • Stage 2 — Model Serving: Loads the trained model and scaler, exposes a Flask web app where users can input patient data through a form and returns a prediction of whether heart disease is present along with a risk probability score.

The two stages are decoupled so that training happens once and the lightweight serving container can be deployed independently.

Dataset

The project uses the UCI Cleveland Heart Disease dataset, a widely used benchmark in medical ML research.

  • Source: UCI ML Repository — Heart Disease
  • Records: ~303 patients (after dropping rows with missing values)
  • Features: 13 clinical attributes (age, sex, chest pain type, blood pressure, cholesterol, etc.)
  • Target: Binary classification — 0 (no heart disease) vs. 1 (heart disease present)

The dataset is fetched directly from the UCI repository URL during training, so no local CSV file is needed.

Project Structure

project/
├── Dockerfile                # Multi-stage build (train + serve)
├── docker-compose.yml        # Two-service orchestration
├── requirements.txt          # Python dependencies
├── README.md
└── src/
    ├── model_training.py     # Data loading, preprocessing, model training
    ├── main.py               # Flask serving application
    ├── statics/
    │   ├── healthy.jpeg      # Image shown for "No Disease" prediction
    │   └── disease.jpeg      # Image shown for "Disease Present" prediction
    └── templates/
        └── predict.html      # Web UI for input and results

How It Works

Training Pipeline (model_training.py)

  1. Data Loading: Fetches the Cleveland Heart Disease CSV from the UCI repository.
  2. Cleaning: Drops rows with missing values (marked as ? in the raw data).
  3. Target Encoding: The original target has values 0–4; these are collapsed to binary (0 = no disease, >0 = disease present).
  4. Train/Test Split: 80/20 split with random_state=42 for reproducibility.
  5. Feature Scaling: StandardScaler is applied to normalize all 13 features. The scaler's mean_ and scale_ arrays are saved to scaler_params.npz so the serving app can apply the same transformation at inference time.
  6. Model Architecture:
    • Dense(16, relu) → Dense(8, relu) → Dense(1, sigmoid)
    • Loss: binary_crossentropy
    • Optimizer: adam
    • Trained for 80 epochs with batch size 16
  7. Output: Saves my_model.keras and scaler_params.npz to the working directory.

Serving Pipeline (main.py)

  1. Startup: Loads my_model.keras and scaler_params.npz.
  2. GET /predict: Renders the predict.html form.
  3. POST /predict: Reads the 13 form fields, applies the saved scaler transform, runs inference, and returns a JSON response with predicted_class ("Disease Present" or "No Disease") and probability (0.0–1.0).
  4. Web UI: The HTML page displays a glassmorphism-styled form. On submission, it shows a color-coded badge (red for disease, green for healthy), a probability risk bar and the corresponding image from statics/.

Prerequisites

  • Docker installed and running
  • Docker Compose
  • Two images placed in src/statics/:
    • healthy.jpeg — displayed when prediction is "No Disease"
    • disease.jpeg — displayed when prediction is "Disease Present"

Running with Docker Compose

Docker Compose runs training and serving as two separate services, connected by a shared volume.

Step 1: Build and Start

docker-compose up --build

What happens:

  1. The model-training service starts:

    • Installs dependencies from requirements.txt
    • Runs model_training.py (downloads data, trains model)
    • Copies my_model.keras and scaler_params.npz to the shared model_exchange volume
    • Container exits after training completes
  2. The serving service starts (only after training succeeds, via depends_on with service_completed_successfully):

    • Installs dependencies
    • Copies the model and scaler from the shared volume
    • Launches Flask on port 4000
    • Port mapping 80:4000 makes it accessible on your host at port 80

Step 2: Open the App

Navigate to:

http://localhost/predict

Step 3: Stop

docker-compose down

To also remove the shared volume:

docker-compose down -v

Running with Multi-Stage Dockerfile

The Dockerfile packages both stages into a single build, with the trained model artifacts passed from the first stage to the second.

Step 1: Build

docker build -t cardioscan .

What happens during build:

  • Stage 1 (model_training): Installs deps, runs training, produces my_model.keras and scaler_params.npz inside the build layer.
  • Stage 2 (serving): Starts from a fresh python:3.10 image, copies only the model artifacts from Stage 1 (via COPY --from=model_training), installs deps and sets up the Flask app. This keeps the final image smaller since training-only dependencies and intermediate files are discarded.

Step 2: Run

docker run -p 80:4000 cardioscan

Step 3: Open the App

Navigate to:

http://localhost/predict

Step 4: Stop

Press Ctrl+C in the terminal, or:

docker stop <container_id>

Using the Web Interface

  1. Open http://localhost/predict in your browser.
  2. Fill in the 13 patient feature fields (see table below for example).
  3. Click SCAN.
  4. The result card appears with:
    • A color-coded badge: red "DISEASE PRESENT" or green "NO DISEASE"
    • A probability risk bar showing the model's confidence (0%–100%)
    • The corresponding image (disease.jpeg or healthy.jpeg)

Sample Input (High Risk)

Field Value
Age 63
Sex Male
Chest Pain Type Asymptomatic
Resting BP 145
Cholesterol 233
Fasting Blood Sugar >120 Yes
Resting ECG LV Hypertrophy
Max Heart Rate 150
Exercise Angina No
ST Depression 2.3
ST Slope Downsloping
# Major Vessels 0
Thalassemia Fixed Defect

Sample Input (Low Risk)

Field Value
Age 35
Sex Female
Chest Pain Type Non-anginal Pain
Resting BP 120
Cholesterol 198
Fasting Blood Sugar >120 No
Resting ECG Normal
Max Heart Rate 172
Exercise Angina No
ST Depression 0.0
ST Slope Upsloping
# Major Vessels 0
Thalassemia Normal

Architecture Details

Why Two Stages?

Separating training from serving is a common MLOps pattern:

  • Training is resource-intensive, runs once (or periodically) and produces artifacts (model weights, scaler params).
  • Serving is lightweight, runs continuously and only needs the artifacts plus inference dependencies.

Multi-Stage Dockerfile vs. Docker Compose

Aspect Multi-Stage Dockerfile Docker Compose
How artifacts transfer COPY --from=model_training between build stages Shared named volume (model_exchange)
When training runs At docker build time At docker-compose up time
Final image contains Only serving code + artifacts Two separate containers
Best for CI/CD pipelines, single deployable image Local development, modular services
GouriRajesh/mlops_lab_05 | GitHunt