GitHunt
DE

default741/py-mlxl

A comprehensive, interactive Machine Learning pipeline built with Dash and Scikit-Learn. Automates the end-to-end workflow from data cleaning and advanced feature selection to baseline modeling and hyperparameter tuning with Optuna, all within a user-friendly GUI.

PY-MLXL: Generalized Machine Learning Pipeline with Dash UI

Overview

PY-MLXL is a comprehensive Machine Learning pipeline tool wrapping a powerful backend with an intuitive Dash Plotly user interface. It is designed to streamline the end-to-end ML workflow, from data ingestion to final model deployment.

Key Features

  • Exploratory Data Analysis (EDA): Interactive visualization of numerical and categorical features.
  • Data Transformation: Automated cleaning, imputation, scaling, and transformation.
  • Feature Selection: Multiple methods including ANOVA, Mutual Information, and Recursive Feature Elimination.
  • Baseline Modeling: Quick comparison of multiple algorithms (Random Forest, XGBoost, LightGBM, SVM, etc.).
  • Hyperparameter Tuning: Automated tuning using Optuna.
  • Final Model Training: Train and save production-ready models.
  • Result Visualization: ROC Curves, Confusion Matrices, and Feature Importance plots.

Installation

  1. Clone the repository.
  2. Install dependencies:
    pip install -r requirements-dev.txt
  3. Run the application:
    python app.py

Usage Flow

  1. Upload Data: Navigate to the EDA section and upload your raw CSV data (e.g., data/raw/paint_quality_assurance_data.csv).
  2. Transform Data: Go to Classification -> Data Transform. Select target variable and transformation steps.
  3. Select Features: Run Feature Selection to identify top predictors.
  4. Baseline: Run Baseline Modeling to find the best performing algorithms.
  5. Tune: Optimize the best model using the Hyperparameter Tuning module. You can either automatically tune the best performers from the Baseline step or manually select specific models and feature sets.
  6. Train: Train the final model on the full dataset.
  7. Visualize: Analyze the final model's performance in the Visualization tab.

Technologies

  • Frontend: Dash, Plotly, Dash Bootstrap Components
  • Backend: Scikit-Learn, Imbalanced-Learn, Optuna, XGBoost, LightGBM
  • Data Handling: Pandas, NumPy

Languages

Python63.5%Jupyter Notebook23.2%TypeScript13.0%JavaScript0.1%CSS0.1%PowerShell0.0%Shell0.0%

Contributors

Created January 31, 2023
Updated December 12, 2025
default741/py-mlxl | GitHunt