TA
tarekmasryo/road-accident-risk-ps5e10
Road accident risk regression (PS S5E10): LightGBM + residual XGBoost + NNLS blend for stable OOF RMSE.
๐ฆ Road Accident Risk โ Residual-Boosted Risk Model
Playground Series S5E10 (Kaggle)
๐ฏ What is this?
Predict accident_risk for each road segment as a calibrated score in [0,1].
Goal:
- Stable CV, not leaderboard luck
- Interpretable signals (why is this road risky?)
- Zero leakage
๐ง Modeling Pipeline (3 stages)
1. LightGBM (main learner)
- Train LightGBM directly on
accident_risk - Bag multiple random seeds โ smoother OOF preds
- Output:
oof_lgb,pred_lgb
2. XGBoost residual (prior-corrected)
-
Build an interpretable safety prior
risk_priorโ[0,1]- high curvature
- high speed limit
- night lighting
- bad weather
-
Train XGBoost on the residual:
residual_target = accident_risk - risk_prior -
At inference:
pred = risk_prior + predicted_residual -
Output:
oof_xgb,pred_xgb
Why? Stage 2 is only learning what the simple prior missed.
3. NNLS blend (non-negative)
- Fit Non-Negative Least Squares (NNLS) on
[oof_lgb, oof_xgb] - Get blend weights โฅ 0 (no negative canceling)
- Apply same weights to test preds
- Clip final predictions to
[0,1] - Output:
final_testโsubmission.csv
Result:
- Lower OOF RMSE
- More consistent folds
- Predictions always in a valid range
๐ฌ Features & CV
Feature engineering
curv_speed= curvature ร speed_limitacc_per_lane= num_reported_accidents / num_lanescritical_zone= high curvature & high speedrisk_prior= human-readable baseline danger score
Cross-validation
- Stratified K-Fold on binned target quantiles
- Keeps each fold balanced (safe vs dangerous segments)
- All metrics are out-of-fold (OOF)
๐ Output
The notebook will:
- Train Stage 1 โ Stage 2 โ Stage 3
- Blend predictions
- Write
submission.csvunderartifacts/(Kaggle:/kaggle/working/artifacts/)
No external data. No test target leakage.
๐ Repo layout
.
โโโ road-accident-risk-ps5e10.ipynb
โโโ data/
โ โโโ raw/ # (optional) place train/test CSVs here for local runs
โโโ artifacts/ # saved outputs (e.g., submission.csv)
โโโ repo_utils/
โ โโโ pathing.py # local data/raw + Kaggle /kaggle/input fallback
โโโ CASE_STUDY.md
โโโ requirements.txt
โโโ .gitignore
๐ฆ Data loading (local + Kaggle)
The notebook resolves files in this order:
DATA_PATHenv var (full file path)- Local
data/raw/<filename> - Kaggle
/kaggle/input/<dataset>/<filename>
For local runs, place these files under data/raw/:
train.csvtest.csvsample_submission.csv
๐ Run locally
python -m venv .venv
# Windows: .venv\Scripts\activate
# macOS/Linux: source .venv/bin/activate
pip install -r requirements.txtOpen road-accident-risk-ps5e10.ipynb and run top-to-bottom.
๐ค Outputs
artifacts/submission.csv(Kaggle:/kaggle/working/artifacts/submission.csv)
On this page
Languages
Jupyter Notebook99.6%Python0.4%
Contributors
MIT License
Created October 26, 2025
Updated February 8, 2026