CardioScan: Heart Disease Prediction with Docker
A containerized ML application that trains a neural network on the UCI Cleveland Heart Disease dataset and serves predictions through a Flask web interface. The project demonstrates multi-stage Docker builds and Docker Compose for separating model training from model serving.
Project Overview
This project follows a two-stage containerized ML workflow:
- Stage 1 — Model Training: Downloads the Heart Disease dataset, preprocesses it, trains a TensorFlow/Keras binary classifier and saves both the trained model (
my_model.keras) and the scaler parameters (scaler_params.npz). - Stage 2 — Model Serving: Loads the trained model and scaler, exposes a Flask web app where users can input patient data through a form and returns a prediction of whether heart disease is present along with a risk probability score.
The two stages are decoupled so that training happens once and the lightweight serving container can be deployed independently.
Dataset
The project uses the UCI Cleveland Heart Disease dataset, a widely used benchmark in medical ML research.
- Source: UCI ML Repository — Heart Disease
- Records: ~303 patients (after dropping rows with missing values)
- Features: 13 clinical attributes (age, sex, chest pain type, blood pressure, cholesterol, etc.)
- Target: Binary classification —
0(no heart disease) vs.1(heart disease present)
The dataset is fetched directly from the UCI repository URL during training, so no local CSV file is needed.
Project Structure
project/
├── Dockerfile # Multi-stage build (train + serve)
├── docker-compose.yml # Two-service orchestration
├── requirements.txt # Python dependencies
├── README.md
└── src/
├── model_training.py # Data loading, preprocessing, model training
├── main.py # Flask serving application
├── statics/
│ ├── healthy.jpeg # Image shown for "No Disease" prediction
│ └── disease.jpeg # Image shown for "Disease Present" prediction
└── templates/
└── predict.html # Web UI for input and results
How It Works
Training Pipeline (model_training.py)
- Data Loading: Fetches the Cleveland Heart Disease CSV from the UCI repository.
- Cleaning: Drops rows with missing values (marked as
?in the raw data). - Target Encoding: The original target has values 0–4; these are collapsed to binary (0 = no disease, >0 = disease present).
- Train/Test Split: 80/20 split with
random_state=42for reproducibility. - Feature Scaling:
StandardScaleris applied to normalize all 13 features. The scaler'smean_andscale_arrays are saved toscaler_params.npzso the serving app can apply the same transformation at inference time. - Model Architecture:
- Dense(16, relu) → Dense(8, relu) → Dense(1, sigmoid)
- Loss:
binary_crossentropy - Optimizer:
adam - Trained for 80 epochs with batch size 16
- Output: Saves
my_model.kerasandscaler_params.npzto the working directory.
Serving Pipeline (main.py)
- Startup: Loads
my_model.kerasandscaler_params.npz. - GET /predict: Renders the
predict.htmlform. - POST /predict: Reads the 13 form fields, applies the saved scaler transform, runs inference, and returns a JSON response with
predicted_class("Disease Present" or "No Disease") andprobability(0.0–1.0). - Web UI: The HTML page displays a glassmorphism-styled form. On submission, it shows a color-coded badge (red for disease, green for healthy), a probability risk bar and the corresponding image from
statics/.
Prerequisites
- Docker installed and running
- Docker Compose
- Two images placed in
src/statics/:healthy.jpeg— displayed when prediction is "No Disease"disease.jpeg— displayed when prediction is "Disease Present"
Running with Docker Compose
Docker Compose runs training and serving as two separate services, connected by a shared volume.
Step 1: Build and Start
docker-compose up --buildWhat happens:
-
The
model-trainingservice starts:- Installs dependencies from
requirements.txt - Runs
model_training.py(downloads data, trains model) - Copies
my_model.kerasandscaler_params.npzto the sharedmodel_exchangevolume - Container exits after training completes
- Installs dependencies from
-
The
servingservice starts (only after training succeeds, viadepends_onwithservice_completed_successfully):- Installs dependencies
- Copies the model and scaler from the shared volume
- Launches Flask on port 4000
- Port mapping
80:4000makes it accessible on your host at port 80
Step 2: Open the App
Navigate to:
http://localhost/predict
Step 3: Stop
docker-compose downTo also remove the shared volume:
docker-compose down -vRunning with Multi-Stage Dockerfile
The Dockerfile packages both stages into a single build, with the trained model artifacts passed from the first stage to the second.
Step 1: Build
docker build -t cardioscan .What happens during build:
- Stage 1 (
model_training): Installs deps, runs training, producesmy_model.kerasandscaler_params.npzinside the build layer. - Stage 2 (
serving): Starts from a freshpython:3.10image, copies only the model artifacts from Stage 1 (viaCOPY --from=model_training), installs deps and sets up the Flask app. This keeps the final image smaller since training-only dependencies and intermediate files are discarded.
Step 2: Run
docker run -p 80:4000 cardioscanStep 3: Open the App
Navigate to:
http://localhost/predict
Step 4: Stop
Press Ctrl+C in the terminal, or:
docker stop <container_id>Using the Web Interface
- Open
http://localhost/predictin your browser. - Fill in the 13 patient feature fields (see table below for example).
- Click SCAN.
- The result card appears with:
- A color-coded badge: red "DISEASE PRESENT" or green "NO DISEASE"
- A probability risk bar showing the model's confidence (0%–100%)
- The corresponding image (
disease.jpegorhealthy.jpeg)
Sample Input (High Risk)
| Field | Value |
|---|---|
| Age | 63 |
| Sex | Male |
| Chest Pain Type | Asymptomatic |
| Resting BP | 145 |
| Cholesterol | 233 |
| Fasting Blood Sugar >120 | Yes |
| Resting ECG | LV Hypertrophy |
| Max Heart Rate | 150 |
| Exercise Angina | No |
| ST Depression | 2.3 |
| ST Slope | Downsloping |
| # Major Vessels | 0 |
| Thalassemia | Fixed Defect |
Sample Input (Low Risk)
| Field | Value |
|---|---|
| Age | 35 |
| Sex | Female |
| Chest Pain Type | Non-anginal Pain |
| Resting BP | 120 |
| Cholesterol | 198 |
| Fasting Blood Sugar >120 | No |
| Resting ECG | Normal |
| Max Heart Rate | 172 |
| Exercise Angina | No |
| ST Depression | 0.0 |
| ST Slope | Upsloping |
| # Major Vessels | 0 |
| Thalassemia | Normal |
Architecture Details
Why Two Stages?
Separating training from serving is a common MLOps pattern:
- Training is resource-intensive, runs once (or periodically) and produces artifacts (model weights, scaler params).
- Serving is lightweight, runs continuously and only needs the artifacts plus inference dependencies.
Multi-Stage Dockerfile vs. Docker Compose
| Aspect | Multi-Stage Dockerfile | Docker Compose |
|---|---|---|
| How artifacts transfer | COPY --from=model_training between build stages |
Shared named volume (model_exchange) |
| When training runs | At docker build time |
At docker-compose up time |
| Final image contains | Only serving code + artifacts | Two separate containers |
| Best for | CI/CD pipelines, single deployable image | Local development, modular services |