GitHunt
GE

georgemuriithi/kaggle-competitions

House Prices Prediction and Credit Default Risk Prediction competitions. Advanced decision tree-based regression and classification models are used.

Kaggle Competitions

License

House Prices Prediction and Credit Default Risk Prediction competitions.

https://www.kaggle.com/c/house-prices-advanced-regression-techniques

https://www.kaggle.com/c/home-credit-default-risk

In both, advanced decision tree-based regression and classification models are used.

In House Prices Prediction, performance evaluation is based on RMSLE (Root Mean Squared Logarithmic Error), while in Credit Default Risk Prediction, it is based on AUROC (Area Under Receiver Operating Characteristic).

In House Prices Prediction, I ranked 816/5011, with an error of 0.12549, compared to the best one of 0.00000.

Screenshot 2022-01-24 115000

In Credit Default Risk Prediction, I scored 0.73610, compared to the best score of 0.81724. Ranking was unavailable.

Screenshot 2022-01-26 220705

My submissions can be accessed from the submissions folder.

Problem Description

The problems are detailed well in the Kaggle links provided above.

Solution Approach

House Prices Prediction

Open In Colab

After Feature engineering, the following regression models are tested:

  • Ridge
  • BaggingRegressor
    • n_estimators=50
  • RandomForestRegressor
    • n_estimators=50
  • XGBRegressor
    • max_depth=5
    • objective='reg:squarederror'
  • LGBMRegressor
  • VotingRegressor
    • estimators=[ridge, bagging, random_forest, xgb, lgbm]
    • n_jobs=-1
  • StackingRegressor
    • estimators=[ridge, bagging, random_forest, xgb, lgbm]
    • final_estimator=Ridge
    • n_jobs=-1

Hyperparameters:

  • train_test_split(test_size=0.2, random_state=0)
  • kfold = KFold(n_splits=5, shuffle=True, random_state=0)
  • cross_val_score(cv=kfold)

VotingRegressor is the best performing, with the best combined Validation R2 score, RMSLE and Cross validation R2 mean score.

Credit Default Risk Prediction

Open In Colab

After Feature engineering, the following classification models are tested:

  • XGBClassifier
    • tree_method='gpu_hist'
    • gpu_id=0
  • LGBMClassifier
    • device='gpu'
  • RandomForestClassifier
    • n_estimators=50
  • StackingClassifier
    • estimators=[xgb, lgbm, random_forest]
    • final_estimator=LGBMClassifier
    • n_jobs=-1

Hyperparameter: train_test_split(test_size=0.2, random_state=42)

GPU is leveraged. Classification requires more computation power.

LGBMClassifier is the best performing, with the maximum Validation AUROC score.

Languages

Jupyter Notebook100.0%

Contributors

GNU General Public License v3.0
Created November 19, 2021
Updated February 2, 2024