GitHunt

Anomaly Detection Advanced

Python
scikit--learn
NumPy
Pandas
License
Docker

Plataforma modular de deteccao de anomalias com 11 algoritmos cobrindo metodos estatisticos, aprendizado de maquina, series temporais e ensemble, projetada para pipelines de monitoramento em producao.

Modular anomaly detection platform with 11 algorithms spanning statistical, machine learning, time-series and ensemble methods, designed for production monitoring pipelines.

Portugues | English


Portugues

Sobre

Sistema profissional de deteccao de anomalias que integra quatro familias de algoritmos em uma arquitetura orientada a objetos com hierarquia de classes abstrata (BaseDetector). Inclui detectores estatisticos classicos (Z-Score, Modified Z-Score/MAD, Grubbs, IQR), detectores baseados em ML (Isolation Forest, LOF, One-Class SVM, DBSCAN), detectores temporais (decomposicao sazonal STL, CUSUM, suavizacao exponencial) e um ensemble configuravel com estrategias de voting, averaging e stacking. Complementado por um extrator de features robusto que gera estatisticas rolling, features de lag, componentes FFT e diferencas para alimentar os detectores.

Tecnologias

Tecnologia Versao Papel
Python 3.9+ Linguagem principal
NumPy >= 1.24.0 Computacao vetorizada
Pandas >= 2.0.0 Engenharia de features
SciPy >= 1.11.0 Distribuicoes estatisticas (Grubbs, z-scores)
scikit-learn >= 1.3.0 Isolation Forest, LOF, SVM, DBSCAN, Scaler
pytest >= 7.4.0 Suite de testes
Docker - Containerizacao

Arquitetura

graph TD
    subgraph Detectores["Detectores de Anomalia"]
        subgraph Stat["Estatisticos"]
            ZS["Z-Score"]
            MZ["Modified Z-Score<br/>(MAD)"]
            GR["Grubbs Test"]
            IQ["IQR Detector"]
        end
        subgraph ML["Machine Learning"]
            IF["Isolation Forest"]
            LO["LOF"]
            SVM["One-Class SVM"]
            DB["DBSCAN"]
        end
        subgraph TS["Series Temporais"]
            SD["Decomposicao<br/>Sazonal (STL)"]
            CU["CUSUM"]
            ES["Suavizacao<br/>Exponencial"]
        end
    end

    subgraph Ensemble["Ensemble"]
        EN["EnsembleDetector"]
        EN --> |voting| V["Votacao Ponderada"]
        EN --> |averaging| AV["Media Normalizada"]
        EN --> |stacking| ST["Meta-Learner<br/>(LogisticRegression)"]
    end

    subgraph Features["Engenharia de Features"]
        FE["FeatureExtractor"]
        FE --> RO["Rolling Stats"]
        FE --> LA["Lag Features"]
        FE --> FF["FFT Spectral"]
        FE --> DI["Diff/Rate-of-Change"]
    end

    BD["BaseDetector<br/>(ABC)"]
    BD --> ZS & MZ & GR & IQ
    BD --> IF & LO & SVM & DB
    BD --> SD & CU & ES
    BD --> EN

    FE -.-> Detectores

Fluxo de Processamento

sequenceDiagram
    participant D as Dados Brutos
    participant FE as FeatureExtractor
    participant DET as Detector(es)
    participant ENS as Ensemble
    participant R as Resultado

    D->>FE: Serie temporal / tabular
    FE->>FE: Rolling stats + Lags + FFT + Diff
    FE->>DET: Feature matrix

    DET->>DET: fit(X_train)
    DET->>DET: predict(X_test)
    DET-->>R: labels (0=normal, 1=anomalia)

    Note over ENS: Combinacao opcional
    DET->>ENS: scores de N detectores
    ENS->>ENS: voting / averaging / stacking
    ENS-->>R: labels finais

Estrutura do Projeto

anomaly-detection-advanced/
├── src/
│   ├── __init__.py
│   ├── detectors/
│   │   ├── __init__.py
│   │   ├── statistical.py               # BaseDetector + 4 detectores (411 LOC)
│   │   ├── ml_based.py                  # 4 detectores ML (414 LOC)
│   │   ├── ensemble.py                  # EnsembleDetector (216 LOC)
│   │   └── timeseries.py               # 3 detectores temporais (412 LOC)
│   ├── features/
│   │   ├── __init__.py
│   │   └── feature_extractor.py         # FeatureExtractor (318 LOC)
│   ├── data/
│   │   └── __init__.py
│   ├── evaluation/
│   │   └── __init__.py
│   └── pipeline/
│       └── __init__.py
├── tests/
│   └── test_models.py                   # Testes unitarios (100 LOC)
├── assets/
├── config/
├── data/
├── docs/
├── notebooks/
├── .gitignore
├── Dockerfile
├── LICENSE                               # MIT
├── README.md
├── pytest.ini
├── requirements.txt
└── setup.py

Inicio Rapido

# Clonar repositorio
git clone https://github.com/galafis/anomaly-detection-advanced.git
cd anomaly-detection-advanced

# Criar ambiente virtual
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Instalar dependencias
pip install -r requirements.txt

# Executar demo
python -c "
from src.detectors.statistical import ZScoreDetector
import numpy as np

data = np.concatenate([np.random.randn(100), [10, -8, 12]])
det = ZScoreDetector(threshold=3.0)
det.fit(data)
labels = det.predict(data)
print(f'Anomalias detectadas: {labels.sum()} de {len(data)} amostras')
"

Docker

docker build -t anomaly-detection .
docker run --rm anomaly-detection

Testes

# Executar testes
pytest tests/ -v

# Com cobertura
pytest tests/ --cov=src --cov-report=term-missing

Benchmarks

Detector Fit (1k amostras) Predict (1k) Tipo
Z-Score < 1 ms < 1 ms Estatistico
Modified Z-Score (MAD) < 1 ms < 1 ms Estatistico
IQR < 1 ms < 1 ms Estatistico
Grubbs Test < 1 ms < 1 ms Estatistico
Isolation Forest ~20 ms ~5 ms ML
LOF ~15 ms ~10 ms ML
One-Class SVM ~30 ms ~5 ms ML
CUSUM < 1 ms < 1 ms Temporal
Ensemble (3 detectores) ~50 ms ~15 ms Combinacao

Exemplo de Uso

from src.detectors.statistical import ZScoreDetector, IQRDetector
from src.detectors.ml_based import IsolationForestDetector
from src.detectors.ensemble import EnsembleDetector
from src.features.feature_extractor import FeatureExtractor
import numpy as np

# Dados com anomalias injetadas
np.random.seed(42)
normal = np.random.randn(500)
anomalies = np.array([8.5, -7.2, 9.1, -8.8, 10.0])
data = np.concatenate([normal, anomalies]).reshape(-1, 1)

# Detector individual
zscore = ZScoreDetector(threshold=3.0)
labels = zscore.fit_predict(data)
print(f"Z-Score: {labels.sum()} anomalias")

# Ensemble com 3 detectores
ensemble = EnsembleDetector(
    detectors=[
        ZScoreDetector(threshold=3.0),
        IQRDetector(factor=1.5),
        IsolationForestDetector(contamination=0.01, random_state=42),
    ],
    strategy="averaging",
)
ensemble.fit(data)
ensemble_labels = ensemble.predict(data)
print(f"Ensemble: {ensemble_labels.sum()} anomalias")

Aplicabilidade na Industria

Setor Caso de Uso Detector Recomendado
Financas Deteccao de fraude em transacoes Isolation Forest + Ensemble
IoT / Manufatura Monitoramento de sensores CUSUM + Decomposicao Sazonal
Infraestrutura Deteccao de falhas em servidores Z-Score + LOF
E-commerce Padroes anomalos de compra DBSCAN + IQR
Saude Monitoramento de sinais vitais Suavizacao Exponencial
Ciberseguranca Deteccao de intrusao One-Class SVM + Ensemble

Licenca

Este projeto esta licenciado sob a Licenca MIT - veja o arquivo LICENSE para detalhes.


English

About

Professional anomaly detection system integrating four algorithm families within an object-oriented architecture with an abstract class hierarchy (BaseDetector). Includes classical statistical detectors (Z-Score, Modified Z-Score/MAD, Grubbs, IQR), ML-based detectors (Isolation Forest, LOF, One-Class SVM, DBSCAN), temporal detectors (STL seasonal decomposition, CUSUM, exponential smoothing) and a configurable ensemble with voting, averaging and stacking strategies. Complemented by a robust feature extractor that generates rolling statistics, lag features, FFT components and differences to feed the detectors.

Technologies

Technology Version Role
Python 3.9+ Core language
NumPy >= 1.24.0 Vectorized computation
Pandas >= 2.0.0 Feature engineering
SciPy >= 1.11.0 Statistical distributions (Grubbs, z-scores)
scikit-learn >= 1.3.0 Isolation Forest, LOF, SVM, DBSCAN, Scaler
pytest >= 7.4.0 Test suite
Docker - Containerization

Architecture

graph TD
    subgraph Detectors["Anomaly Detectors"]
        subgraph Stat["Statistical"]
            ZS["Z-Score"]
            MZ["Modified Z-Score<br/>(MAD)"]
            GR["Grubbs Test"]
            IQ["IQR Detector"]
        end
        subgraph ML["Machine Learning"]
            IF["Isolation Forest"]
            LO["LOF"]
            SVM["One-Class SVM"]
            DB["DBSCAN"]
        end
        subgraph TS["Time Series"]
            SD["Seasonal<br/>Decomposition (STL)"]
            CU["CUSUM"]
            ES["Exponential<br/>Smoothing"]
        end
    end

    subgraph Ensemble["Ensemble"]
        EN["EnsembleDetector"]
        EN --> |voting| V["Weighted Voting"]
        EN --> |averaging| AV["Normalized Averaging"]
        EN --> |stacking| ST["Meta-Learner<br/>(LogisticRegression)"]
    end

    subgraph Features["Feature Engineering"]
        FE["FeatureExtractor"]
        FE --> RO["Rolling Stats"]
        FE --> LA["Lag Features"]
        FE --> FF["FFT Spectral"]
        FE --> DI["Diff/Rate-of-Change"]
    end

    BD["BaseDetector<br/>(ABC)"]
    BD --> ZS & MZ & GR & IQ
    BD --> IF & LO & SVM & DB
    BD --> SD & CU & ES
    BD --> EN

    FE -.-> Detectors

Processing Flow

sequenceDiagram
    participant D as Raw Data
    participant FE as FeatureExtractor
    participant DET as Detector(s)
    participant ENS as Ensemble
    participant R as Result

    D->>FE: Time series / tabular
    FE->>FE: Rolling stats + Lags + FFT + Diff
    FE->>DET: Feature matrix

    DET->>DET: fit(X_train)
    DET->>DET: predict(X_test)
    DET-->>R: labels (0=normal, 1=anomaly)

    Note over ENS: Optional combination
    DET->>ENS: scores from N detectors
    ENS->>ENS: voting / averaging / stacking
    ENS-->>R: final labels

Project Structure

anomaly-detection-advanced/
├── src/
│   ├── __init__.py
│   ├── detectors/
│   │   ├── __init__.py
│   │   ├── statistical.py               # BaseDetector + 4 detectors (411 LOC)
│   │   ├── ml_based.py                  # 4 ML detectors (414 LOC)
│   │   ├── ensemble.py                  # EnsembleDetector (216 LOC)
│   │   └── timeseries.py               # 3 temporal detectors (412 LOC)
│   ├── features/
│   │   ├── __init__.py
│   │   └── feature_extractor.py         # FeatureExtractor (318 LOC)
│   ├── data/
│   │   └── __init__.py
│   ├── evaluation/
│   │   └── __init__.py
│   └── pipeline/
│       └── __init__.py
├── tests/
│   └── test_models.py                   # Unit tests (100 LOC)
├── assets/
├── config/
├── data/
├── docs/
├── notebooks/
├── .gitignore
├── Dockerfile
├── LICENSE                               # MIT
├── README.md
├── pytest.ini
├── requirements.txt
└── setup.py

Quick Start

# Clone repository
git clone https://github.com/galafis/anomaly-detection-advanced.git
cd anomaly-detection-advanced

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Run demo
python -c "
from src.detectors.statistical import ZScoreDetector
import numpy as np

data = np.concatenate([np.random.randn(100), [10, -8, 12]])
det = ZScoreDetector(threshold=3.0)
det.fit(data)
labels = det.predict(data)
print(f'Anomalies detected: {labels.sum()} out of {len(data)} samples')
"

Docker

docker build -t anomaly-detection .
docker run --rm anomaly-detection

Tests

# Run tests
pytest tests/ -v

# With coverage
pytest tests/ --cov=src --cov-report=term-missing

Benchmarks

Detector Fit (1k samples) Predict (1k) Type
Z-Score < 1 ms < 1 ms Statistical
Modified Z-Score (MAD) < 1 ms < 1 ms Statistical
IQR < 1 ms < 1 ms Statistical
Grubbs Test < 1 ms < 1 ms Statistical
Isolation Forest ~20 ms ~5 ms ML
LOF ~15 ms ~10 ms ML
One-Class SVM ~30 ms ~5 ms ML
CUSUM < 1 ms < 1 ms Temporal
Ensemble (3 detectors) ~50 ms ~15 ms Combined

Usage Example

from src.detectors.statistical import ZScoreDetector, IQRDetector
from src.detectors.ml_based import IsolationForestDetector
from src.detectors.ensemble import EnsembleDetector
from src.features.feature_extractor import FeatureExtractor
import numpy as np

# Data with injected anomalies
np.random.seed(42)
normal = np.random.randn(500)
anomalies = np.array([8.5, -7.2, 9.1, -8.8, 10.0])
data = np.concatenate([normal, anomalies]).reshape(-1, 1)

# Individual detector
zscore = ZScoreDetector(threshold=3.0)
labels = zscore.fit_predict(data)
print(f"Z-Score: {labels.sum()} anomalies")

# Ensemble with 3 detectors
ensemble = EnsembleDetector(
    detectors=[
        ZScoreDetector(threshold=3.0),
        IQRDetector(factor=1.5),
        IsolationForestDetector(contamination=0.01, random_state=42),
    ],
    strategy="averaging",
)
ensemble.fit(data)
ensemble_labels = ensemble.predict(data)
print(f"Ensemble: {ensemble_labels.sum()} anomalies")

Industry Applicability

Sector Use Case Recommended Detector
Finance Transaction fraud detection Isolation Forest + Ensemble
IoT / Manufacturing Sensor monitoring CUSUM + Seasonal Decomposition
Infrastructure Server failure detection Z-Score + LOF
E-commerce Anomalous purchase patterns DBSCAN + IQR
Healthcare Vital signs monitoring Exponential Smoothing
Cybersecurity Intrusion detection One-Class SVM + Ensemble

License

This project is licensed under the MIT License - see the LICENSE file for details.


Autor / Author: Gabriel Demetrios Lafis

GitHub
LinkedIn