GitHunt

Implementation of Regression and Classification Algorithms

This repository contains the implementation of an assignment for the Statistical Pattern Recognition course. The project focuses on building fundamental machine learning algorithms from scratch using only NumPy, Pandas, and Matplotlib. No high-level machine learning libraries (such as scikit-learn) were used for model implementation, ensuring a deep understanding of the underlying mathematics and optimization techniques.
The project covers three main areas:

  1. Linear Regression (Batch Gradient Descent, Stochastic Gradient Descent, Closed-Form)
  2. Logistic Regression (Binary Classification)
  3. Multiclass Classification (One-vs-All, One-vs-One, Softmax Regression)

Features

1. Linear Regression

  • Dataset: Auto MPG (Predicting fuel efficiency).
  • Implementations:
    • Batch Gradient Descent (BGD)
    • Stochastic Gradient Descent (SGD)
    • Closed-Form Solution (Normal Equation)
  • Analysis: Comparison of convergence rates, cost functions, and final parameters across different optimization strategies.
  • Bonus: Ridge Regression with SGD to handle multicollinearity.

2. Logistic Regression

  • Dataset: Bank Marketing (Predicting term deposit subscription).
  • Implementations:
    • Binary Logistic Regression using Gradient Descent.
    • Custom stratified train-test split.
    • Feature preprocessing (Encoding, Standardization).
  • Evaluation: Accuracy, Precision, Recall, F1-Score, and Confusion Matrix.

3. Multiclass Classification

  • Dataset: Wine Dataset (Classifying wine cultivars).
  • Implementations:
    • One-vs-All (OvA)
    • One-vs-One (OvO)
    • Softmax Regression
  • Analysis: Performance comparison regarding accuracy and computational complexity.
  • Robustness: Evaluation of model performance with and without outlier removal.

Project Structure

.
├── code.ipynb              # Main Jupyter Notebook containing all implementations
├── dataset/                # Directory containing dataset files
│   ├── auto_mpg.csv
│   ├── bank-full.csv
│   └── wine_dataset.csv
├── README.md               # Project documentation

Notes

  • All models were implemented from scratch without using sklearn.linear_model or similar modules.
  • Data preprocessing steps (normalization, encoding, outlier removal) were implemented manually using NumPy and Pandas.
  • This project is for educational purposes as part of the Statistical Pattern Recognition course.