GitHunt
MM

mmkearns96/brfss-healthy-aging-project

Predicting self-reported health in seniors who participated in the Behavioral Risk Factor Surveillance System (CRFSS) 2015 Survey.

brfss-healthy-aging-project

Title: Predicting Self-Rated Health in Seniors: Analysis of the 2015 BRFSS Survey With Machine Learning Techniques

Author: Madeleine May Kearns, BSc, MA

Data source: https://www.cdc.gov/brfss/annual_data/annual_2015.html

Background: Research has consistently shown that socioeconomic status, physical activity, smoking, diet, and alcohol
consumption have a significant effect on health in older individuals. However, most of the current literature has evaluated
the unique effect of isolated attributes. Currently, there are only a few studies assessing the cumulative effects
of demographic characteristics and lifestyle behaviours on self-reported health in seniors.

Objective: This project uses the 2015 BRFSS survey to answer the following research questions:

  1. what demographic characteristics are associated with good health in individuals aged 60 years or older,
  2. what lifestyle behaviours are associated with good health in individuals aged 60 years or older, and
  3. what is the cumulative predictive power of demographic and lifestyle characteristics on good health
    in individuals aged 60 years or older?

Methodology overview: Cross-sectional analyses will be conducted to compare demographic characteristics and lifestyle behaviours
between seniors who reported good and poor general health in the survey. For predictive modeling, classification approaches will be
used to assess the cumulative predictive power of demographic and lifestyle characteristics on health in individuals aged 60 years
or older. The best-performing model will be hyper-optimized and feature importance will be assessed to determine the variables
that most strongly predict health in older individuals.

Specific methodological steps:

  1. Variable and respondent selection (age > 60 years and 43 variables)
  2. Removing missing data (respondents with >15% missing data)
  3. Outlier removal (winsorizing)
  4. Exploratory analyses (univariate, bivariate with general health)
  5. Normalization of the dataset, imputation, and test-train split
  6. Predictive modelling (classification)
  7. Feature importance and hyperoptimization of the best performing algorithm
  8. Multivariate analyses of top ten important features

To use this project: Please download the XPT file from the CDC link above. The code has been placed in an RMarkdown
document which can be downloaded and used on your local environment. Begin with the data_cleaning file and then move on to the
initial_results file.

Current Maintainer: Madeleine May Kearns

Contributors

Created September 6, 2022
Updated November 29, 2022