NokeYuan/Diabetes-Readmission-AutoGluon
AutoML-based prediction of 30-day readmissions in diabetic patients using ensemble learning with AutoGluon.
๐ฏ Predicting 30-Day Readmissions in Diabetic Patients Using Ensemble Learning with AutoGluon
This repository presents an end-to-end pipeline for predicting 30-day hospital readmissions among diabetic patients using modern AutoML techniques. The study leverages AutoGluon, a state-of-the-art ensemble-based AutoML framework, and evaluates it against traditional machine learning, deep learning, and transformer-based tabular models.
This study contributes by validating the
AutoGluon ensemble model for 30-day readmission prediction in diabetic patients,
benchmarking it against diverse models, and highlighting risk factors to support
early intervention.
๐ง Overview
Hospital readmissions within 30 days are a major quality metric and financial burden, particularly for diabetic patients. This project builds and evaluates predictive models to assess readmission risk using structured clinical data from electronic health records (EHRs). The core contributions of this work include:
- Development of an AutoML pipeline based on AutoGluon
- Comparison of ensemble methods with traditional ML, DL, and foundation models
- Exploration of preprocessing techniques, feature importance, and subgroup performance
The results consistently show that ensemble learning via AutoGluon outperforms other models, with LightGBM and CatBoost being strong individual contenders. Deep neural networks and transformer-based models (e.g., TabPFNMix) are competitive but underperform in this static tabular setting.
๐ For more details, please access the full paper and the presentation slides.
๐ Dataset
This project uses the publicly available dataset from the UCI Machine Learning Repository:
Diabetes 130-US hospitals for years 1999โ2008
๐ Repository Structure
.
โโโ src/ # Source code modules
โ โโโ Config.py # Global configuration and constants
โ โโโ Prep.py # Data cleaning, preprocessing, clustering
โ โโโ Train_model.py # Training logic using AutoGluon
โ โโโ Utils.py # Utility functions
โ โโโ Vis.py # Visualizations (e.g., SHAP, performance plots)
โ
โโโ notebooks/ # Jupyter notebooks
โ โโโ train.ipynb # Training and evaluation
โ
โโโ paper/ # Research paper and supplementary material
โ โโโ AutoGluon_Readmission_Predictions.pdf
โ
โโโ ag.yaml # Conda environment file
โโโ README.md # This file
โ๏ธ Installation
We recommend using a Conda environment for reproducibility:
conda env create -f ag.yaml
conda activate ag๐ Usage
You can run the full workflow interactively inside the Jupyter notebook:
-
Launch the notebook:
jupyter notebook train.ipynb
-
Run the full pipeline:
# Inside train.ipynb from src.Train_model import TrainAutoGluon trainer = TrainAutoGluon(...) trainer.run_pipeline()
-
Visualize results:
from src.Vis import plot_feature_importance, shap_summary_plot, ...
This approach allows for step-by-step inspection, debugging, and comparison.
๐ Citation
If you use this work, please cite:
@article{yuan2025readmission,
title={Predicting 30-Day Readmissions in Diabetic Patients Using Ensemble Learning with AutoGluon},
author={Yuan, Baijiang},
year={2025},
note={University of Toronto, Institute of Medical Science and University Health Network}
}