GitHunt
EN

eng-jonathan/DataScience_MachineLearning_and_StatisticalModeling

Queens College - Math 390.4/342W: Data Science via Machine Learning and Statistical Modeling (R)

Data Science via Machine Learning and Statistical Modeling


Mathematical Model on Marriage Success

  • Prompt | Report
  • Creates a Mathematical Model to numerically estimate what makes a success marriage. It incorporates feature selection, data training methods, and possible output errors.

Predicting Apartment Prices in Queens

  • Prompt | Report | R Code
  • Uses Supervised Machine Learning to beat Zillow.com’s “zestimates”
  • Developed in R and incorporates data modeling and manipulation techniques such as data removal, munging, and imputation, and linear and forest regressions

Results:

✓ Random Forest Model predicts within $27,000. (The Average Home Price is $315,000)

✗ Model Requires more observations to accurately predict extrapolated data, however performs well within the included zipcodes.


Course Overview:

  • Syllabus
  • Philosophy of modeling and learning using data
  • Prediction via the ordinary linear model including orthogonal projections, sum of squares identity, R2 and RMSE
  • Polynomial and interaction regressions
  • Prediction with machine learning including neural nets (the perceptron), support vector machines and the tree methods CART, bagged trees and Random Forests
  • Probability estimation using logistic regression, asymmetric cost classifiers and the ROC / DET performance curves
  • Underfitting vs. overfitting and the bias-variance decomposition / tradeoff
  • Model validation including out of sample techniques such as cross validation and bootstrap validation
  • Correlation vs. causation, causal models, lurking variables and interpretations of linear model coefficients
  • Extrapolation
  • The R language will be taught formally from the ground and up as well as visualization using the ggplot library and manipulation using the dplyr and data.table libraries.

Incorporated Topics

  • Basic Probability Theory: axioms, conditional probability, in/dependence
  • Modeling with discrete random variables: Bernoulli, Hypergeometric, Binomial, Poisson, Geometric, Negative Binomial, Uniform Discrete and others
  • Expectation and variance
  • Modeling with continuous random variables: Exponential, Uniform and Normal
  • Frequentist confidence intervals and hypothesis testing for one-sample proportions
  • Basic visualization of data: plots, histograms, bar charts
  • Linear algebra: Vectors, matrices, rank, transpose
  • Programming: basic data types, vectors, arrays, control flow (for, while, if, else), functions

eng-jonathan/DataScience_MachineLearning_and_StatisticalModeling | GitHunt