shruti-sivakumar/Credit-Card-Anomaly-GMM
Unsupervised anomaly detection for credit card fraud using Gaussian Mixture Models (GMMs). Models legitimate behavior via probability density estimation and flags low-likelihood transactions as potential frauds.
Credit Card Fraud Detection via Anomaly Detection (GMM)
This project implements unsupervised anomaly detection on credit card transactions using Gaussian Mixture Models (GMMs). It focuses on identifying fraudulent transactions in a highly imbalanced dataset by modeling the likelihood of legitimate behavior and flagging low-probability anomalies.
Problem Statement
Fraudulent transactions are rare but costly. With only 0.17% fraud in the dataset, traditional classifiers struggle. This project uses probability density estimation to model legitimate behavior and flags outliers as potential frauds — without needing labeled data for training.
Dataset
- Source: Kaggle – Credit Card Fraud Detection -> download manually and place it in 'data/creditcard.csv'
- Size: 284,807 transactions
- Fraud Cases: 492 (~0.172%)
- Features: PCA-transformed V1–V28 + Time, Amount, and Class (0 = legit, 1 = fraud)
⚙️ Methodology
Feature Selection
Selected top features most correlated with fraud: V14, V17, V11, V4, V15, V13
Distribution Analysis
Used seaborn and matplotlib to visualize class-wise distributions and assess Gaussian fit.
GMM Modeling
- Trained Gaussian Mixture Model on legitimate transactions using
V14andV17 - Used Expectation-Maximization (EM) algorithm to estimate parameters
Likelihood Scoring & Thresholding
- Computed log-likelihood scores for each transaction
- Set a threshold
Tto classify low-likelihood samples as fraud
Evaluation
- Precision, Recall, F1-Score for class 1 (fraud)
- Plotted Precision-Recall Curve
- Computed AUCPR = 0.679
Results Summary
| Metric | Legit Class (0) | Fraud Class (1) |
|---|---|---|
| Precision | 1.00 | 0.95 |
| Recall | 1.00 | 0.72 |
| F1-Score | 1.00 | 0.82 |
| AUCPR | - | 0.679 |
Repository Structure
Credit-Card-Anomaly-GMM/
├── data/
│ └── creditcard.csv # Dataset (or link in README)
├── anomaly_detection_gmm.ipynb # Full GMM pipeline notebook
├── README.md
├── LICENSE
Output images are present in the Jupyter notebook.
Author
Built by Shruti Sivakumar — as a focused showcase of probabilistic anomaly detection applied to real-world financial fraud.
License
MIT License – see LICENSE for details.