GitHunt
NO

NokeYuan/Diabetes-Readmission-AutoGluon

AutoML-based prediction of 30-day readmissions in diabetic patients using ensemble learning with AutoGluon.

๐ŸŽฏ Predicting 30-Day Readmissions in Diabetic Patients Using Ensemble Learning with AutoGluon

This repository presents an end-to-end pipeline for predicting 30-day hospital readmissions among diabetic patients using modern AutoML techniques. The study leverages AutoGluon, a state-of-the-art ensemble-based AutoML framework, and evaluates it against traditional machine learning, deep learning, and transformer-based tabular models.

This study contributes by validating the
AutoGluon ensemble model for 30-day readmission prediction in diabetic patients,
benchmarking it against diverse models, and highlighting risk factors to support
early intervention.


๐Ÿง  Overview

Hospital readmissions within 30 days are a major quality metric and financial burden, particularly for diabetic patients. This project builds and evaluates predictive models to assess readmission risk using structured clinical data from electronic health records (EHRs). The core contributions of this work include:

  • Development of an AutoML pipeline based on AutoGluon
  • Comparison of ensemble methods with traditional ML, DL, and foundation models
  • Exploration of preprocessing techniques, feature importance, and subgroup performance

The results consistently show that ensemble learning via AutoGluon outperforms other models, with LightGBM and CatBoost being strong individual contenders. Deep neural networks and transformer-based models (e.g., TabPFNMix) are competitive but underperform in this static tabular setting.

๐Ÿ“„ For more details, please access the full paper and the presentation slides.


๐Ÿ“Š Dataset

This project uses the publicly available dataset from the UCI Machine Learning Repository:
Diabetes 130-US hospitals for years 1999โ€“2008


๐Ÿ“ Repository Structure

.
โ”œโ”€โ”€ src/                    # Source code modules
โ”‚   โ”œโ”€โ”€ Config.py           # Global configuration and constants
โ”‚   โ”œโ”€โ”€ Prep.py             # Data cleaning, preprocessing, clustering
โ”‚   โ”œโ”€โ”€ Train_model.py      # Training logic using AutoGluon
โ”‚   โ”œโ”€โ”€ Utils.py            # Utility functions
โ”‚   โ””โ”€โ”€ Vis.py              # Visualizations (e.g., SHAP, performance plots)
โ”‚
โ”œโ”€โ”€ notebooks/              # Jupyter notebooks
โ”‚   โ””โ”€โ”€ train.ipynb         # Training and evaluation
โ”‚
โ”œโ”€โ”€ paper/                  # Research paper and supplementary material
โ”‚   โ””โ”€โ”€ AutoGluon_Readmission_Predictions.pdf
โ”‚
โ”œโ”€โ”€ ag.yaml                 # Conda environment file
โ”œโ”€โ”€ README.md               # This file

โš™๏ธ Installation

We recommend using a Conda environment for reproducibility:

conda env create -f ag.yaml
conda activate ag

๐Ÿš€ Usage

You can run the full workflow interactively inside the Jupyter notebook:

  1. Launch the notebook:

    jupyter notebook train.ipynb
  2. Run the full pipeline:

    # Inside train.ipynb
    from src.Train_model import TrainAutoGluon
    trainer = TrainAutoGluon(...)
    trainer.run_pipeline()
  3. Visualize results:

    from src.Vis import plot_feature_importance, shap_summary_plot, ...

This approach allows for step-by-step inspection, debugging, and comparison.


๐Ÿ“š Citation

If you use this work, please cite:

@article{yuan2025readmission,
  title={Predicting 30-Day Readmissions in Diabetic Patients Using Ensemble Learning with AutoGluon},
  author={Yuan, Baijiang},
  year={2025},
  note={University of Toronto, Institute of Medical Science and University Health Network}
}