Loan-Status-Prediction-Using-Machine-Learning

Overview 📋

This end-to-end machine learning project predicts loan approval status based on customer profiles. Built using Python and scikit-learn, it covers data preprocessing, feature scaling, model training, evaluation, and hyperparameter tuning. Additionally, a user-friendly GUI was developed for seamless user interaction.

Project Workflow 🔧

Data Loading
Exploratory Data Analysis (EDA)
Data Cleaning & Missing Value Handling
Feature Engineering (Encoding Categorical Features)
Feature Scaling
Model Training and Evaluation
Hyperparameter Tuning
Cross-Validation (K-Fold)
Model Deployment & GUI Development

App demo

Input Category

Breakdown of Key Concepts in the Code

🔍 Feature Scaling

Approach: Implemented StandardScaler to standardize numerical features like income and loan amount.
Purpose: Essential for distance-based models such as Logistic Regression and SVC to ensure fair weight distribution among features.

🏷️ Feature Encoding

Approach: Utilized Label Encoding to transform categorical variables into numeric values for model compatibility.
Example Mappings:

"Yes" → 1, "No" → 0
"Male" → 1, "Female" → 0
"Urban" → 2, "Semi-Urban" → 1, "Rural" → 0

🧪 K-Fold Cross Validation

Approach: Applied 5-Fold Cross-Validation using cross_val_score for robust performance evaluation.
Purpose: Helps ensure the model generalizes well across unseen data, mitigating overfitting risks.

⚡ Hyperparameter Tuning

Approach: Employed RandomizedSearchCV to optimize model hyperparameters.
Examples:
- Logistic Regression & SVC optimized for the C (regularization) parameter.
Outcome: Significant performance boost, especially for SVC and Random Forest models.

Insights from the Project

🌟 Key Influencing Features

Credit History: The most critical factor influencing loan approval.
Income: Applicant and co-applicant incomes significantly impacted the decision-making process.

📈 Model Performance

Baseline Performance: Logistic Regression and SVC performed reasonably well but required tuning.

Post-Tuning Results:
Random Forest: Achieved the highest accuracy (~80%) after tuning.
SVC: Performance improved post-tuning but remained slightly behind Random Forest.

🧹 Missing Values Strategy

Minimal Missing Data (<5%): Columns like gender and dependents were dropped.
Key Columns: Imputed columns such as self_employed and credit_history using the most frequent value (mode).

⚖️ Feature Scaling

Why It Mattered: Standardized features (income, loan amount, loan term) ensured no single feature dominated due to scale differences.

🔄 Cross-Validation Insights

Outcome: The 5-Fold Cross-Validation demonstrated that the Random Forest model generalizes effectively across various data splits, making it the top performer.

RoshniRanaDS27/Loan_Status_Prediction_Using_Machine_Learning