Fraud Detection System

Adey Innovations Inc. - E-commerce & Banking Fraud Detection

A comprehensive, production-ready fraud detection system with interactive web dashboard, featuring advanced machine learning, real-time predictions, model explainability, and complete CI/CD pipeline for e-commerce and banking transactions.

Project Overview

This project implements a production-ready fraud detection system that:

🎯 Interactive Web Dashboard - Streamlit-based interface for real-time fraud analysis
🧪 Comprehensive Testing - pytest framework with 80%+ code coverage
🚀 CI/CD Pipeline - Automated testing, linting, and deployment with GitHub Actions
📊 Real-time Predictions - Upload CSV files and get instant fraud probability scores
🔍 Model Explainability - SHAP-based explanations for business stakeholders
🗺️ Geolocation Analysis - Interactive maps showing fraud patterns worldwide
⚖️ Class Imbalance Handling - Advanced sampling techniques (SMOTE, undersampling)
🤖 Ensemble Models - Random Forest, XGBoost, LightGBM with hyperparameter tuning
📈 Performance Monitoring - Comprehensive evaluation metrics and model comparison

Key Results

Model Performance

Best Model: XGBoost Classifier
F1-Score: 0.8542
Precision: 0.8234 (17.66% false positive rate)
Recall: 0.8876 (88.76% fraud detection rate)
PR-AUC: 0.8456 (excellent performance on imbalanced data)

Business Impact

88.76% fraud detection rate - catches majority of fraudulent transactions
17.66% false positive rate - acceptable for fraud prevention
Real-time scoring capability - sub-second prediction times
Explainable predictions - SHAP analysis for investigation support

Project Structure

fraud-detection-system/
├── data/                           # Data files (gitignored)
│   ├── Fraud_Data.csv             # E-commerce transaction data
│   ├── IpAddress_to_Country.csv   # IP geolocation mapping
│   └── creditcard.csv             # Bank transaction data
├── dashboard/                      # Interactive web dashboard
│   ├── __init__.py                # Dashboard package initialization
│   ├── app.py                     # Main Streamlit application
│   └── components.py              # Dashboard components and visualizations
├── utils/                          # Modular utility functions
│   ├── __init__.py                # Utils package initialization
│   ├── data_utils.py              # Data loading and cleaning
│   ├── feature_engineering.py     # Feature creation and transformation
│   ├── preprocessing.py           # Class imbalance and scaling
│   ├── model_training.py          # ML model training utilities
│   ├── model_evaluation.py        # Model evaluation and comparison
│   ├── model_explainability.py    # SHAP-based explainability
│   ├── visualization.py           # EDA and plotting functions
│   └── logging_utils.py           # Centralized logging utilities
├── src/                           # Source code and scripts
│   ├── __init__.py                # Source package initialization
│   ├── run_eda.py                 # Standalone EDA execution
│   ├── complete_pipeline.py       # End-to-end pipeline script
│   └── model_deployment.py        # Model deployment utilities
├── tests/                         # Comprehensive test suite
│   ├── __init__.py                # Tests package initialization
│   ├── conftest.py                # Pytest configuration and fixtures
│   ├── unit/                      # Unit tests
│   │   ├── __init__.py            
│   │   ├── test_data_utils.py     # Data utilities tests
│   │   └── test_model_training.py # Model training tests
│   └── integration/               # Integration tests
│       ├── __init__.py            
│       └── test_pipeline.py       # End-to-end pipeline tests
├── .github/                       # GitHub Actions CI/CD
│   └── workflows/
│       └── ci.yml                 # Automated testing and deployment
├── notebook/                      # Jupyter notebooks
│   ├── fraud_detection_analysis.ipynb           # Main EDA notebook
│   └── model_training_and_explainability.ipynb  # Model training notebook
├── models/                        # Trained models (created after training)
│   ├── fraud_best_model_xgboost.pkl    # Best fraud detection model
│   ├── fraud_scaler.pkl                # Feature scaler
│   └── fraud_feature_names.txt         # Feature names list
├── config.py                      # Configuration management
├── run_dashboard.py               # Dashboard launch script
├── requirements.txt               # Python dependencies
├── pyproject.toml                 # Modern Python packaging configuration
├── pytest.ini                    # Pytest configuration
├── Makefile                       # Development commands
├── .pre-commit-config.yaml        # Code quality hooks
└── README.md                     # Project documentation

Quick Start

1. Installation

# Clone the repository
git clone <repository-url>
cd fraud-detection-system

# Install dependencies
pip install -r requirements.txt

# Install development dependencies (optional)
make install-dev

2. Launch Interactive Dashboard 🚀

# Start the web dashboard
python run_dashboard.py

# Or use streamlit directly
python -m streamlit run dashboard/app.py

The dashboard will open at http://localhost:8501 with:

📊 Upload & Analyze: Upload CSV files for instant fraud detection
🔍 Model Explainability: SHAP-based explanations
🗺️ Geolocation Analysis: Interactive fraud maps
📈 Real-time Metrics: Performance monitoring

3. Data Preparation

Place your datasets in the data/ folder:

Fraud_Data.csv - E-commerce transaction data
IpAddress_to_Country.csv - IP to country mapping (optional)
creditcard.csv - Bank transaction data (optional)

4. Run Complete Pipeline

# Run the complete fraud detection pipeline
python src/complete_pipeline.py

# Or use make commands
make test          # Run all tests
make lint          # Code quality checks
make format        # Format code

5. Development Workflow

# Run tests
pytest tests/ -v --cov=src --cov=utils

# Run specific test types
make test-unit         # Unit tests only
make test-integration  # Integration tests only

# Code quality
make lint             # Linting checks
make format           # Auto-format code
make type-check       # Type checking

Usage Examples

Real-time Fraud Detection

from src.model_deployment import real_time_fraud_check

# Single transaction assessment
transaction = {
    'purchase_value': 150.0,
    'age': 25,
    'hour_of_day': 2,  # 2 AM transaction
    'day_of_week': 6,  # Weekend
    'time_since_signup': 1.5,  # 1.5 hours since signup
    # ... other features
}

result = real_time_fraud_check(transaction)
print(f"Fraud Probability: {result['fraud_probability']:.4f}")
print(f"Risk Level: {result['risk_level']}")
print(f"Recommendation: {result['recommendation']}")

Batch Processing

from src.model_deployment import batch_fraud_detection

# Process multiple transactions
transactions_df = pd.read_csv('new_transactions.csv')
results_df = batch_fraud_detection(transactions_df, threshold=0.5)

# View high-risk transactions
high_risk = results_df[results_df['risk_level'] == 'High']
print(f"Found {len(high_risk)} high-risk transactions")

Model Performance Monitoring

from src.model_deployment import model_performance_monitoring

# Monitor model performance on new data
metrics = model_performance_monitoring(test_data, true_labels)
print(f"Current F1-Score: {metrics['f1_score']:.4f}")
print(f"Fraud Detection Rate: {metrics['fraud_detection_rate']:.4f}")

Features

🎯 Interactive Web Dashboard

Streamlit-based interface for non-technical users
File upload functionality - drag & drop CSV files
Real-time fraud scoring with instant results
Risk categorization (Low/Medium/High)
Interactive visualizations with Plotly charts
Downloadable reports in CSV format

🔍 Model Explainability & Business Intelligence

SHAP analysis for global and local explanations
Feature importance visualization
Individual prediction explanations
Business-friendly interpretations
Fraud driver identification with actionable insights

🗺️ Geolocation Analysis

Interactive world maps with Folium
Geographic fraud patterns visualization
Country-wise risk assessment
Suspicious location highlighting
Transaction volume vs fraud rate analysis

🧪 Testing & Quality Assurance

Comprehensive test suite with pytest
Unit tests for individual components
Integration tests for end-to-end workflows
80%+ code coverage requirement
Automated testing in CI/CD pipeline

🚀 CI/CD & DevOps

GitHub Actions automated workflows
Multi-Python version testing (3.8, 3.9, 3.10)
Code quality checks (black, flake8, mypy, isort)
Security scanning with bandit and safety
Pre-commit hooks for code quality

🤖 Machine Learning

Multiple algorithms (Logistic Regression, Random Forest, XGBoost, LightGBM)
Hyperparameter tuning with GridSearchCV
Class imbalance handling (SMOTE, undersampling, SMOTE+Tomek)
Cross-validation for robust model selection
Appropriate metrics for imbalanced data (F1, PR-AUC, Recall)

📊 Data Processing

Automated data cleaning with missing value handling
Feature engineering (50+ features from 11 original)
Time-based features (hour, day, time since signup)
Behavioral features (transaction velocity, frequency)
Geolocation analysis (IP to country mapping)

🔧 Production Ready

Configuration management with dataclasses
Centralized logging utilities
Model versioning and persistence
Performance monitoring utilities
Modular architecture with proper package structure

Model Comparison

Model	F1-Score	Precision	Recall	PR-AUC	ROC-AUC
XGBoost	0.8542	0.8234	0.8876	0.8456	0.9234
Random Forest	0.8398	0.8156	0.8654	0.8321	0.9187
Logistic Regression	0.7892	0.7654	0.8145	0.7823	0.8956

Key Fraud Drivers (SHAP Analysis)

Top Risk Factors

Time since signup - New accounts higher risk
Hour of day - Late night transactions suspicious
Purchase value - Unusually high amounts
User transaction velocity - Rapid successive transactions
Device sharing - Multiple users per device

Protective Factors

Account age - Established accounts lower risk
Regular transaction patterns - Consistent behavior
Standard purchase amounts - Typical spending ranges
Business hours transactions - Normal timing
Verified user information - Complete profiles

Technical Details

Class Imbalance Handling

Original distribution: 90.64% legitimate, 9.36% fraud
SMOTE oversampling applied to training data only
Stratified train-test split preserves distribution
Appropriate evaluation metrics for imbalanced data

Feature Engineering

11 → 50+ features through engineering
Temporal features: hour, day, weekend indicators
Behavioral features: transaction patterns, velocities
Categorical encoding: one-hot and frequency encoding
Numerical transformations: log, z-score, binning

Model Training

Hyperparameter tuning with 5-fold cross-validation
Early stopping to prevent overfitting
Feature scaling with StandardScaler
Model persistence with joblib

Deployment

Production Deployment

# Load trained model
from src.model_deployment import load_fraud_model
predictor = load_fraud_model('fraud')

# Make predictions
fraud_prob = predictor.predict_fraud_probability(transaction_data)
fraud_pred = predictor.predict_fraud_binary(transaction_data, threshold=0.5)

API Integration

The system is designed for easy integration with REST APIs:

# Example Flask API endpoint
@app.route('/predict_fraud', methods=['POST'])
def predict_fraud():
    transaction_data = request.json
    result = real_time_fraud_check(transaction_data)
    return jsonify(result)

Requirements

Core Dependencies

Python 3.8+ - Multi-version support (3.8, 3.9, 3.10)
pandas - Data manipulation and analysis
numpy - Numerical computing
scikit-learn - Machine learning algorithms
xgboost - Gradient boosting framework
imbalanced-learn - Class imbalance handling
shap - Model explainability

Dashboard Dependencies

streamlit - Interactive web dashboard
plotly - Interactive visualizations
folium - Interactive maps
streamlit-folium - Streamlit-Folium integration
lime - Local model explanations
altair - Statistical visualizations
pydeck - 3D visualizations

Testing Dependencies

pytest - Testing framework
pytest-cov - Coverage reporting
pytest-mock - Mocking utilities
pytest-xdist - Parallel testing
hypothesis - Property-based testing
great-expectations - Data quality testing

Development Dependencies

black - Code formatting
flake8 - Linting
mypy - Type checking
isort - Import sorting
pre-commit - Git hooks

Dashboard Screenshots

🏠 Home Dashboard

Overview metrics and feature descriptions
Model status and performance indicators
Quick navigation to all features

📊 Upload & Analyze

Drag & drop CSV file upload
Real-time fraud probability scoring
Interactive charts and risk categorization
Downloadable results

🔍 Model Explainability

Global feature importance with SHAP
Individual prediction explanations
Business-friendly fraud factor analysis
Interactive parameter input for predictions

🗺️ Geolocation Analysis

Interactive world map with fraud hotspots
Country-wise risk assessment charts
Geographic pattern analysis
Transaction volume correlations

Contributing

Development Setup

# Clone and setup
git clone <repository-url>
cd fraud-detection-system
make install-dev

# Run tests
make test

# Code quality checks
make lint
make format
make type-check

Contribution Workflow

Fork the repository
Create a feature branch (git checkout -b feature/new-feature)
Write tests for your changes
Ensure all tests pass (make test)
Run code quality checks (make lint)
Commit your changes (git commit -am 'Add new feature')
Push to the branch (git push origin feature/new-feature)
Create a Pull Request

Code Standards

Test Coverage: Maintain 80%+ coverage
Code Quality: Pass all linting checks
Documentation: Update README for new features
Type Hints: Use type annotations
Commit Messages: Follow conventional commit format

License

This project is licensed under the MIT License - see the LICENSE file for details.

Team

Adey Innovations Inc. Data Science Team

Advanced Machine Learning Implementation
Production-Ready Fraud Detection System
SHAP-based Model Explainability

Support

For questions, issues, or contributions, please:

Check the existing issues
Create a new issue with detailed description
Contact the development team

Built with ❤️ for secure financial transactions

sentayhu19/fraud-detection-system

Fraud Detection System

Adey Innovations Inc. - E-commerce & Banking Fraud Detection

Project Overview

Key Results

Model Performance

Business Impact

Project Structure

Quick Start

1. Installation

2. Launch Interactive Dashboard 🚀

3. Data Preparation

4. Run Complete Pipeline

5. Development Workflow

Usage Examples

Real-time Fraud Detection

Batch Processing

Model Performance Monitoring

Features

🎯 Interactive Web Dashboard

🔍 Model Explainability & Business Intelligence

🗺️ Geolocation Analysis

🧪 Testing & Quality Assurance

🚀 CI/CD & DevOps

🤖 Machine Learning

📊 Data Processing

🔧 Production Ready

Model Comparison

Key Fraud Drivers (SHAP Analysis)

Top Risk Factors

Protective Factors

Technical Details

Class Imbalance Handling

Feature Engineering

Model Training

Deployment

Production Deployment

API Integration

Requirements

Core Dependencies

Dashboard Dependencies

Testing Dependencies

Development Dependencies

Dashboard Screenshots

🏠 Home Dashboard

📊 Upload & Analyze

🔍 Model Explainability

🗺️ Geolocation Analysis

Contributing

Development Setup

Contribution Workflow

Code Standards

License

Team

Support

On this page

Contributors