CardioScan: Heart Disease Prediction with Docker

A containerized ML application that trains a neural network on the UCI Cleveland Heart Disease dataset and serves predictions through a Flask web interface. The project demonstrates multi-stage Docker builds and Docker Compose for separating model training from model serving.

Project Overview

This project follows a two-stage containerized ML workflow:

Stage 1 — Model Training: Downloads the Heart Disease dataset, preprocesses it, trains a TensorFlow/Keras binary classifier and saves both the trained model (my_model.keras) and the scaler parameters (scaler_params.npz).
Stage 2 — Model Serving: Loads the trained model and scaler, exposes a Flask web app where users can input patient data through a form and returns a prediction of whether heart disease is present along with a risk probability score.

The two stages are decoupled so that training happens once and the lightweight serving container can be deployed independently.

Dataset

The project uses the UCI Cleveland Heart Disease dataset, a widely used benchmark in medical ML research.

Source: UCI ML Repository — Heart Disease
Records: ~303 patients (after dropping rows with missing values)
Features: 13 clinical attributes (age, sex, chest pain type, blood pressure, cholesterol, etc.)
Target: Binary classification — 0 (no heart disease) vs. 1 (heart disease present)

The dataset is fetched directly from the UCI repository URL during training, so no local CSV file is needed.

Project Structure

project/
├── Dockerfile                # Multi-stage build (train + serve)
├── docker-compose.yml        # Two-service orchestration
├── requirements.txt          # Python dependencies
├── README.md
└── src/
    ├── model_training.py     # Data loading, preprocessing, model training
    ├── main.py               # Flask serving application
    ├── statics/
    │   ├── healthy.jpeg      # Image shown for "No Disease" prediction
    │   └── disease.jpeg      # Image shown for "Disease Present" prediction
    └── templates/
        └── predict.html      # Web UI for input and results

How It Works

Training Pipeline (`model_training.py`)

Data Loading: Fetches the Cleveland Heart Disease CSV from the UCI repository.
Cleaning: Drops rows with missing values (marked as ? in the raw data).
Target Encoding: The original target has values 0–4; these are collapsed to binary (0 = no disease, >0 = disease present).
Train/Test Split: 80/20 split with random_state=42 for reproducibility.
Feature Scaling: StandardScaler is applied to normalize all 13 features. The scaler's mean_ and scale_ arrays are saved to scaler_params.npz so the serving app can apply the same transformation at inference time.
Model Architecture:
- Dense(16, relu) → Dense(8, relu) → Dense(1, sigmoid)
- Loss: binary_crossentropy
- Optimizer: adam
- Trained for 80 epochs with batch size 16
Output: Saves my_model.keras and scaler_params.npz to the working directory.

Serving Pipeline (`main.py`)

Startup: Loads my_model.keras and scaler_params.npz.
GET /predict: Renders the predict.html form.
POST /predict: Reads the 13 form fields, applies the saved scaler transform, runs inference, and returns a JSON response with predicted_class ("Disease Present" or "No Disease") and probability (0.0–1.0).
Web UI: The HTML page displays a glassmorphism-styled form. On submission, it shows a color-coded badge (red for disease, green for healthy), a probability risk bar and the corresponding image from statics/.

Prerequisites

Docker installed and running
Docker Compose
Two images placed in src/statics/:
- healthy.jpeg — displayed when prediction is "No Disease"
- disease.jpeg — displayed when prediction is "Disease Present"

Running with Docker Compose

Docker Compose runs training and serving as two separate services, connected by a shared volume.

Step 1: Build and Start

docker-compose up --build

What happens:

The model-training service starts:
- Installs dependencies from requirements.txt
- Runs model_training.py (downloads data, trains model)
- Copies my_model.keras and scaler_params.npz to the shared model_exchange volume
- Container exits after training completes
The serving service starts (only after training succeeds, via depends_on with service_completed_successfully):
- Installs dependencies
- Copies the model and scaler from the shared volume
- Launches Flask on port 4000
- Port mapping 80:4000 makes it accessible on your host at port 80

Step 2: Open the App

Navigate to:

http://localhost/predict

Step 3: Stop

docker-compose down

To also remove the shared volume:

docker-compose down -v

Running with Multi-Stage Dockerfile

The Dockerfile packages both stages into a single build, with the trained model artifacts passed from the first stage to the second.

Step 1: Build

docker build -t cardioscan .

What happens during build:

Stage 1 (model_training): Installs deps, runs training, produces my_model.keras and scaler_params.npz inside the build layer.
Stage 2 (serving): Starts from a fresh python:3.10 image, copies only the model artifacts from Stage 1 (via COPY --from=model_training), installs deps and sets up the Flask app. This keeps the final image smaller since training-only dependencies and intermediate files are discarded.

Step 2: Run

docker run -p 80:4000 cardioscan

Step 3: Open the App

Navigate to:

http://localhost/predict

Step 4: Stop

Press Ctrl+C in the terminal, or:

docker stop <container_id>

Using the Web Interface

Open http://localhost/predict in your browser.
Fill in the 13 patient feature fields (see table below for example).
Click SCAN.
The result card appears with:
- A color-coded badge: red "DISEASE PRESENT" or green "NO DISEASE"
- A probability risk bar showing the model's confidence (0%–100%)
- The corresponding image (disease.jpeg or healthy.jpeg)

Sample Input (High Risk)

Field	Value
Age	63
Sex	Male
Chest Pain Type	Asymptomatic
Resting BP	145
Cholesterol	233
Fasting Blood Sugar >120	Yes
Resting ECG	LV Hypertrophy
Max Heart Rate	150
Exercise Angina	No
ST Depression	2.3
ST Slope	Downsloping
# Major Vessels	0
Thalassemia	Fixed Defect

Sample Input (Low Risk)

Field	Value
Age	35
Sex	Female
Chest Pain Type	Non-anginal Pain
Resting BP	120
Cholesterol	198
Fasting Blood Sugar >120	No
Resting ECG	Normal
Max Heart Rate	172
Exercise Angina	No
ST Depression	0.0
ST Slope	Upsloping
# Major Vessels	0
Thalassemia	Normal

Architecture Details

Why Two Stages?

Separating training from serving is a common MLOps pattern:

Training is resource-intensive, runs once (or periodically) and produces artifacts (model weights, scaler params).
Serving is lightweight, runs continuously and only needs the artifacts plus inference dependencies.

Multi-Stage Dockerfile vs. Docker Compose

Aspect	Multi-Stage Dockerfile	Docker Compose
How artifacts transfer	`COPY --from=model_training` between build stages	Shared named volume (`model_exchange`)
When training runs	At `docker build` time	At `docker-compose up` time
Final image contains	Only serving code + artifacts	Two separate containers
Best for	CI/CD pipelines, single deployable image	Local development, modular services

GouriRajesh/mlops_lab_05

CardioScan: Heart Disease Prediction with Docker

Project Overview

Dataset

Project Structure

How It Works

Training Pipeline (`model_training.py`)

Serving Pipeline (`main.py`)

Prerequisites

Running with Docker Compose

Step 1: Build and Start

Step 2: Open the App

Step 3: Stop

Running with Multi-Stage Dockerfile

Step 1: Build

Step 2: Run

Step 3: Open the App

Step 4: Stop

Using the Web Interface

Sample Input (High Risk)

Sample Input (Low Risk)

Architecture Details

Why Two Stages?

Multi-Stage Dockerfile vs. Docker Compose

On this page

Languages

Contributors

GouriRajesh/mlops_lab_05

CardioScan: Heart Disease Prediction with Docker

Project Overview

Dataset

Project Structure

How It Works

Training Pipeline (model_training.py)

Serving Pipeline (main.py)

Prerequisites

Running with Docker Compose

Step 1: Build and Start

Step 2: Open the App

Step 3: Stop

Running with Multi-Stage Dockerfile

Step 1: Build

Step 2: Run

Step 3: Open the App

Step 4: Stop

Using the Web Interface

Sample Input (High Risk)

Sample Input (Low Risk)

Architecture Details

Why Two Stages?

Multi-Stage Dockerfile vs. Docker Compose

On this page

Languages

Contributors

Training Pipeline (`model_training.py`)

Serving Pipeline (`main.py`)