🎯 Predicting 30-Day Readmissions in Diabetic Patients Using Ensemble Learning with AutoGluon

This repository presents an end-to-end pipeline for predicting 30-day hospital readmissions among diabetic patients using modern AutoML techniques. The study leverages AutoGluon, a state-of-the-art ensemble-based AutoML framework, and evaluates it against traditional machine learning, deep learning, and transformer-based tabular models.

This study contributes by validating the
AutoGluon ensemble model for 30-day readmission prediction in diabetic patients,
benchmarking it against diverse models, and highlighting risk factors to support
early intervention.

🧠 Overview

Hospital readmissions within 30 days are a major quality metric and financial burden, particularly for diabetic patients. This project builds and evaluates predictive models to assess readmission risk using structured clinical data from electronic health records (EHRs). The core contributions of this work include:

Development of an AutoML pipeline based on AutoGluon
Comparison of ensemble methods with traditional ML, DL, and foundation models
Exploration of preprocessing techniques, feature importance, and subgroup performance

The results consistently show that ensemble learning via AutoGluon outperforms other models, with LightGBM and CatBoost being strong individual contenders. Deep neural networks and transformer-based models (e.g., TabPFNMix) are competitive but underperform in this static tabular setting.

📄 For more details, please access the full paper and the presentation slides.

📊 Dataset

This project uses the publicly available dataset from the UCI Machine Learning Repository:
Diabetes 130-US hospitals for years 1999–2008

📁 Repository Structure

.
├── src/                    # Source code modules
│   ├── Config.py           # Global configuration and constants
│   ├── Prep.py             # Data cleaning, preprocessing, clustering
│   ├── Train_model.py      # Training logic using AutoGluon
│   ├── Utils.py            # Utility functions
│   └── Vis.py              # Visualizations (e.g., SHAP, performance plots)
│
├── notebooks/              # Jupyter notebooks
│   └── train.ipynb         # Training and evaluation
│
├── paper/                  # Research paper and supplementary material
│   └── AutoGluon_Readmission_Predictions.pdf
│
├── ag.yaml                 # Conda environment file
├── README.md               # This file

⚙️ Installation

We recommend using a Conda environment for reproducibility:

conda env create -f ag.yaml
conda activate ag

🚀 Usage

You can run the full workflow interactively inside the Jupyter notebook:

Launch the notebook:
```
jupyter notebook train.ipynb
```

Run the full pipeline:

# Inside train.ipynb
from src.Train_model import TrainAutoGluon
trainer = TrainAutoGluon(...)
trainer.run_pipeline()

Visualize results:

from src.Vis import plot_feature_importance, shap_summary_plot, ...

This approach allows for step-by-step inspection, debugging, and comparison.

📚 Citation

If you use this work, please cite:

@article{yuan2025readmission,
  title={Predicting 30-Day Readmissions in Diabetic Patients Using Ensemble Learning with AutoGluon},
  author={Yuan, Baijiang},
  year={2025},
  note={University of Toronto, Institute of Medical Science and University Health Network}
}

NokeYuan/Diabetes-Readmission-AutoGluon