SA
sayaliba01/Framingham_CVD_Risk
Healthcare Classification Problem
Framingham_CVD_Risk
Healthcare Classification Problem
Index:
- Background
- Problem Statement
- Data Preprocessing
Preliminary Analysis
- Data Distribution and Outliers
- Categorical Variables
- Numerical Variables
- Missing Values and Imputation
- Correlation Analysis
- Normality Check
- Undersampling Data
- Transformation Pipeline
- Modeling
- KNN Classifier
- Logistic Regression
- Decision Tree Classifier
- Random Forest Classifier
- Bernoullis Naive Beyes Classifier
- Bagging Classifiers
- Performance Evaluation
- Results
- CHallenges and Limitations
- Future Scope
About Project:
- Identifying people at risk of heart disease and making sure they receive proper treatment can prevent these deaths.
- Risk startification with the aid of machine learning methods to identify people at risk of having CVD can prove a better preventive, prognostic and management tool for the population.
Framingham Heart Study (FHS)
- The Framingham Heart Study is a long term prospective study of the etiology of cardiovascular disease among a population of free living subjects in the community of Framingham, Massachusetts in US. The data collected can be studied to identify risk factors and their joint effects.
- The given dataset is a subset of the longitudinal data collected as part of FHS and includes laboratory, clinic, questionnaire, and adjudicated event data on 4,434 participants from which 10-year coronary heart disease risk has been noted over years of surveillance in the participants.
- Original current data source
Available on request here - Link - https://biolincc.nhlbi.nih.gov/teaching/
Objective of the study:
The goal of the analysis is to predict whether the participant has 10-year risk of developing (CHD) coronary heart disease based on current data on risk factors for a participant.
Questions to ask:
- Which risk factors do the dataset have?
- How is the correlation of risk factors with our target value?
- How is our data distributed based on demographic data (sex, age, education level)?
- How is the behavioural data represented in our data?
- Does our target variable have balanced representation in our dataset?
- Applicability of data in view of population demographics
Acknowledgement : Biologic Specimen and Data Repository Information Coordinating Center (BioLINCC), The National Heart, Lung, and Blood Institute (NHLBI), NHI for providing data at request.