dannydave/Heart-Disease-Diagnosis-using-Machine-Learning-and-Data-Mining
Heart Disease Diagnosis using Machine Learning and Data Mining
โค๏ธ Heart Disease Diagnosis using Machine Learning and Data Mining
๐ Project Overview
This repository contains the complete implementation of my MSc dissertation project: "Heart Disease Diagnosis using Machine Learning and Data Mining." The goal is to develop predictive models that accurately diagnose heart disease from patient clinical data, helping to enhance early detection and healthcare outcomes.
The project tackles challenges such as imbalanced data and feature relevance to build robust, interpretable machine learning models that can assist clinicians in decision-making.
๐ฏ Objectives
-
Cleanse, normalize, and select the most impactful features from clinical datasets.
-
Build and compare multiple supervised classification models:
- Random Forest
- Decision Tree
- Naive Bayes
- Support Vector Machine (SVM)
-
Evaluate model performance using comprehensive metrics:
Accuracy, Precision, Recall, F1-score, Confusion Matrix, and ROC-AUC. -
Identify key clinical features driving heart disease prediction.
-
Develop a simple Tkinter-based GUI to enable easy, interactive patient risk prediction.
๐ Dataset
The dataset is a combined collection sourced from IEEEDataport incorporating records from:
- Cleveland Clinic Foundation
- Hungarian Institute of Cardiology
- University Hospital, Zurich
- V.A. Medical Center, Long Beach
- Statlog (Heart) Dataset
๐ Features
The dataset contains 11 clinical features, including:
- Age
- Sex
- Chest Pain Type
- Resting Blood Pressure
- Cholesterol
- Fasting Blood Sugar
- Resting ECG Results
- Maximum Heart Rate
- Exercise-Induced Angina
- Oldpeak (ST Depression)
- ST Slope
โ๏ธ Methodology
Data Preprocessing
- Handle missing values
- Normalize and scale features
- Select relevant features using statistical tests and model-based rankings
Model Development
- Train Decision Tree, Random Forest, Naive Bayes, and SVM classifiers
Model Evaluation
- Assess performance with Accuracy, Precision, Recall, F1-score, Confusion Matrix, and ROC-AUC curves
Feature Importance
- Analyze and aggregate feature importance across models to identify critical predictors
๐ฅ๏ธ GUI Application
- Built with Tkinter
- User-friendly interface for inputting patient data
- Provides heart disease prediction based on the trained Random Forest model
๐ ๏ธ Technologies Used
Python | Scikit-learn | Pandas | NumPy | Matplotlib | Seaborn | Tkinter
๐ About Me
Iโm Daniel Toluwani Adeleke, a Data Scientist & IT professional with a passion for building end-to-end data solutions.
I hold a BSc in Computer Science and an MSc in Data Science & Business Analytics. My expertise includes SQL, Python, Machine Learning, and BI reporting.
๐ง Email: dannydave1000@gmail.com
๐ผ LinkedIn: linkedin.com/in/dannydave
๐ Portfolio: dannydave.my_portfolio.github.io
