GitHunt

โค๏ธ Heart Disease Diagnosis using Machine Learning and Data Mining

Author
SQL Server
GitHub last commit
GitHub repo size
Heart Health

๐Ÿ“‹ Project Overview

This repository contains the complete implementation of my MSc dissertation project: "Heart Disease Diagnosis using Machine Learning and Data Mining." The goal is to develop predictive models that accurately diagnose heart disease from patient clinical data, helping to enhance early detection and healthcare outcomes.

The project tackles challenges such as imbalanced data and feature relevance to build robust, interpretable machine learning models that can assist clinicians in decision-making.


๐ŸŽฏ Objectives

  • Cleanse, normalize, and select the most impactful features from clinical datasets.

  • Build and compare multiple supervised classification models:

    • Random Forest
    • Decision Tree
    • Naive Bayes
    • Support Vector Machine (SVM)
  • Evaluate model performance using comprehensive metrics:
    Accuracy, Precision, Recall, F1-score, Confusion Matrix, and ROC-AUC.

  • Identify key clinical features driving heart disease prediction.

  • Develop a simple Tkinter-based GUI to enable easy, interactive patient risk prediction.


๐Ÿ“Š Dataset

The dataset is a combined collection sourced from IEEEDataport incorporating records from:

  • Cleveland Clinic Foundation
  • Hungarian Institute of Cardiology
  • University Hospital, Zurich
  • V.A. Medical Center, Long Beach
  • Statlog (Heart) Dataset

๐Ÿ” Features

The dataset contains 11 clinical features, including:

  • Age
  • Sex
  • Chest Pain Type
  • Resting Blood Pressure
  • Cholesterol
  • Fasting Blood Sugar
  • Resting ECG Results
  • Maximum Heart Rate
  • Exercise-Induced Angina
  • Oldpeak (ST Depression)
  • ST Slope

โš™๏ธ Methodology

Data Preprocessing

  • Handle missing values
  • Normalize and scale features
  • Select relevant features using statistical tests and model-based rankings

Model Development

  • Train Decision Tree, Random Forest, Naive Bayes, and SVM classifiers

Model Evaluation

  • Assess performance with Accuracy, Precision, Recall, F1-score, Confusion Matrix, and ROC-AUC curves

Feature Importance

  • Analyze and aggregate feature importance across models to identify critical predictors

๐Ÿ–ฅ๏ธ GUI Application

  • Built with Tkinter
  • User-friendly interface for inputting patient data
  • Provides heart disease prediction based on the trained Random Forest model

๐Ÿ› ๏ธ Technologies Used

Python | Scikit-learn | Pandas | NumPy | Matplotlib | Seaborn | Tkinter


๐ŸŒŸ About Me

Iโ€™m Daniel Toluwani Adeleke, a Data Scientist & IT professional with a passion for building end-to-end data solutions.
I hold a BSc in Computer Science and an MSc in Data Science & Business Analytics. My expertise includes SQL, Python, Machine Learning, and BI reporting.

๐Ÿ“ง Email: dannydave1000@gmail.com
๐Ÿ’ผ LinkedIn: linkedin.com/in/dannydave
๐ŸŒ Portfolio: dannydave.my_portfolio.github.io