RoshniRanaDS27/Loan_Status_Prediction_Using_Machine_Learning
An end-to-end machine learning project that predicts loan approval status based on customer profiles. Developed with Python and scikit-learn, it encompasses data preprocessing, feature scaling, model training, evaluation, and hyperparameter tuning. The project also features a user-friendly GUI for seamless user interaction.
Loan-Status-Prediction-Using-Machine-Learning
Overview ๐
This end-to-end machine learning project predicts loan approval status based on customer profiles. Built using Python and scikit-learn, it covers data preprocessing, feature scaling, model training, evaluation, and hyperparameter tuning. Additionally, a user-friendly GUI was developed for seamless user interaction.
Project Workflow ๐ง
- Data Loading
- Exploratory Data Analysis (EDA)
- Data Cleaning & Missing Value Handling
- Feature Engineering (Encoding Categorical Features)
- Feature Scaling
- Model Training and Evaluation
- Hyperparameter Tuning
- Cross-Validation (K-Fold)
- Model Deployment & GUI Development
App demo
Input Category
Breakdown of Key Concepts in the Code
๐ Feature Scaling
- Approach: Implemented StandardScaler to standardize numerical features like income and loan amount.
- Purpose: Essential for distance-based models such as Logistic Regression and SVC to ensure fair weight distribution among features.
๐ท๏ธ Feature Encoding
- Approach: Utilized Label Encoding to transform categorical variables into numeric values for model compatibility.
- Example Mappings:
"Yes" โ 1, "No" โ 0
"Male" โ 1, "Female" โ 0
"Urban" โ 2, "Semi-Urban" โ 1, "Rural" โ 0
๐งช K-Fold Cross Validation
- Approach: Applied 5-Fold Cross-Validation using cross_val_score for robust performance evaluation.
- Purpose: Helps ensure the model generalizes well across unseen data, mitigating overfitting risks.
โก Hyperparameter Tuning
- Approach: Employed RandomizedSearchCV to optimize model hyperparameters.
- Examples:
- Logistic Regression & SVC optimized for the C (regularization) parameter.
- Outcome: Significant performance boost, especially for SVC and Random Forest models.
Insights from the Project
๐ Key Influencing Features
- Credit History: The most critical factor influencing loan approval.
- Income: Applicant and co-applicant incomes significantly impacted the decision-making process.
๐ Model Performance
Baseline Performance: Logistic Regression and SVC performed reasonably well but required tuning.
- Post-Tuning Results:
- Random Forest: Achieved the highest accuracy (~80%) after tuning.
- SVC: Performance improved post-tuning but remained slightly behind Random Forest.
๐งน Missing Values Strategy
- Minimal Missing Data (<5%): Columns like gender and dependents were dropped.
- Key Columns: Imputed columns such as self_employed and credit_history using the most frequent value (mode).
โ๏ธ Feature Scaling
Why It Mattered: Standardized features (income, loan amount, loan term) ensured no single feature dominated due to scale differences.
๐ Cross-Validation Insights
Outcome: The 5-Fold Cross-Validation demonstrated that the Random Forest model generalizes effectively across various data splits, making it the top performer.




