galafis/anomaly-detection-advanced
Anomaly Detection Advanced - Professional Python project
Anomaly Detection Advanced
Plataforma modular de deteccao de anomalias com 11 algoritmos cobrindo metodos estatisticos, aprendizado de maquina, series temporais e ensemble, projetada para pipelines de monitoramento em producao.
Modular anomaly detection platform with 11 algorithms spanning statistical, machine learning, time-series and ensemble methods, designed for production monitoring pipelines.
Portugues
Sobre
Sistema profissional de deteccao de anomalias que integra quatro familias de algoritmos em uma arquitetura orientada a objetos com hierarquia de classes abstrata (BaseDetector). Inclui detectores estatisticos classicos (Z-Score, Modified Z-Score/MAD, Grubbs, IQR), detectores baseados em ML (Isolation Forest, LOF, One-Class SVM, DBSCAN), detectores temporais (decomposicao sazonal STL, CUSUM, suavizacao exponencial) e um ensemble configuravel com estrategias de voting, averaging e stacking. Complementado por um extrator de features robusto que gera estatisticas rolling, features de lag, componentes FFT e diferencas para alimentar os detectores.
Tecnologias
| Tecnologia | Versao | Papel |
|---|---|---|
| Python | 3.9+ | Linguagem principal |
| NumPy | >= 1.24.0 | Computacao vetorizada |
| Pandas | >= 2.0.0 | Engenharia de features |
| SciPy | >= 1.11.0 | Distribuicoes estatisticas (Grubbs, z-scores) |
| scikit-learn | >= 1.3.0 | Isolation Forest, LOF, SVM, DBSCAN, Scaler |
| pytest | >= 7.4.0 | Suite de testes |
| Docker | - | Containerizacao |
Arquitetura
graph TD
subgraph Detectores["Detectores de Anomalia"]
subgraph Stat["Estatisticos"]
ZS["Z-Score"]
MZ["Modified Z-Score<br/>(MAD)"]
GR["Grubbs Test"]
IQ["IQR Detector"]
end
subgraph ML["Machine Learning"]
IF["Isolation Forest"]
LO["LOF"]
SVM["One-Class SVM"]
DB["DBSCAN"]
end
subgraph TS["Series Temporais"]
SD["Decomposicao<br/>Sazonal (STL)"]
CU["CUSUM"]
ES["Suavizacao<br/>Exponencial"]
end
end
subgraph Ensemble["Ensemble"]
EN["EnsembleDetector"]
EN --> |voting| V["Votacao Ponderada"]
EN --> |averaging| AV["Media Normalizada"]
EN --> |stacking| ST["Meta-Learner<br/>(LogisticRegression)"]
end
subgraph Features["Engenharia de Features"]
FE["FeatureExtractor"]
FE --> RO["Rolling Stats"]
FE --> LA["Lag Features"]
FE --> FF["FFT Spectral"]
FE --> DI["Diff/Rate-of-Change"]
end
BD["BaseDetector<br/>(ABC)"]
BD --> ZS & MZ & GR & IQ
BD --> IF & LO & SVM & DB
BD --> SD & CU & ES
BD --> EN
FE -.-> DetectoresFluxo de Processamento
sequenceDiagram
participant D as Dados Brutos
participant FE as FeatureExtractor
participant DET as Detector(es)
participant ENS as Ensemble
participant R as Resultado
D->>FE: Serie temporal / tabular
FE->>FE: Rolling stats + Lags + FFT + Diff
FE->>DET: Feature matrix
DET->>DET: fit(X_train)
DET->>DET: predict(X_test)
DET-->>R: labels (0=normal, 1=anomalia)
Note over ENS: Combinacao opcional
DET->>ENS: scores de N detectores
ENS->>ENS: voting / averaging / stacking
ENS-->>R: labels finaisEstrutura do Projeto
anomaly-detection-advanced/
├── src/
│ ├── __init__.py
│ ├── detectors/
│ │ ├── __init__.py
│ │ ├── statistical.py # BaseDetector + 4 detectores (411 LOC)
│ │ ├── ml_based.py # 4 detectores ML (414 LOC)
│ │ ├── ensemble.py # EnsembleDetector (216 LOC)
│ │ └── timeseries.py # 3 detectores temporais (412 LOC)
│ ├── features/
│ │ ├── __init__.py
│ │ └── feature_extractor.py # FeatureExtractor (318 LOC)
│ ├── data/
│ │ └── __init__.py
│ ├── evaluation/
│ │ └── __init__.py
│ └── pipeline/
│ └── __init__.py
├── tests/
│ └── test_models.py # Testes unitarios (100 LOC)
├── assets/
├── config/
├── data/
├── docs/
├── notebooks/
├── .gitignore
├── Dockerfile
├── LICENSE # MIT
├── README.md
├── pytest.ini
├── requirements.txt
└── setup.py
Inicio Rapido
# Clonar repositorio
git clone https://github.com/galafis/anomaly-detection-advanced.git
cd anomaly-detection-advanced
# Criar ambiente virtual
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Instalar dependencias
pip install -r requirements.txt
# Executar demo
python -c "
from src.detectors.statistical import ZScoreDetector
import numpy as np
data = np.concatenate([np.random.randn(100), [10, -8, 12]])
det = ZScoreDetector(threshold=3.0)
det.fit(data)
labels = det.predict(data)
print(f'Anomalias detectadas: {labels.sum()} de {len(data)} amostras')
"Docker
docker build -t anomaly-detection .
docker run --rm anomaly-detectionTestes
# Executar testes
pytest tests/ -v
# Com cobertura
pytest tests/ --cov=src --cov-report=term-missingBenchmarks
| Detector | Fit (1k amostras) | Predict (1k) | Tipo |
|---|---|---|---|
| Z-Score | < 1 ms | < 1 ms | Estatistico |
| Modified Z-Score (MAD) | < 1 ms | < 1 ms | Estatistico |
| IQR | < 1 ms | < 1 ms | Estatistico |
| Grubbs Test | < 1 ms | < 1 ms | Estatistico |
| Isolation Forest | ~20 ms | ~5 ms | ML |
| LOF | ~15 ms | ~10 ms | ML |
| One-Class SVM | ~30 ms | ~5 ms | ML |
| CUSUM | < 1 ms | < 1 ms | Temporal |
| Ensemble (3 detectores) | ~50 ms | ~15 ms | Combinacao |
Exemplo de Uso
from src.detectors.statistical import ZScoreDetector, IQRDetector
from src.detectors.ml_based import IsolationForestDetector
from src.detectors.ensemble import EnsembleDetector
from src.features.feature_extractor import FeatureExtractor
import numpy as np
# Dados com anomalias injetadas
np.random.seed(42)
normal = np.random.randn(500)
anomalies = np.array([8.5, -7.2, 9.1, -8.8, 10.0])
data = np.concatenate([normal, anomalies]).reshape(-1, 1)
# Detector individual
zscore = ZScoreDetector(threshold=3.0)
labels = zscore.fit_predict(data)
print(f"Z-Score: {labels.sum()} anomalias")
# Ensemble com 3 detectores
ensemble = EnsembleDetector(
detectors=[
ZScoreDetector(threshold=3.0),
IQRDetector(factor=1.5),
IsolationForestDetector(contamination=0.01, random_state=42),
],
strategy="averaging",
)
ensemble.fit(data)
ensemble_labels = ensemble.predict(data)
print(f"Ensemble: {ensemble_labels.sum()} anomalias")Aplicabilidade na Industria
| Setor | Caso de Uso | Detector Recomendado |
|---|---|---|
| Financas | Deteccao de fraude em transacoes | Isolation Forest + Ensemble |
| IoT / Manufatura | Monitoramento de sensores | CUSUM + Decomposicao Sazonal |
| Infraestrutura | Deteccao de falhas em servidores | Z-Score + LOF |
| E-commerce | Padroes anomalos de compra | DBSCAN + IQR |
| Saude | Monitoramento de sinais vitais | Suavizacao Exponencial |
| Ciberseguranca | Deteccao de intrusao | One-Class SVM + Ensemble |
Licenca
Este projeto esta licenciado sob a Licenca MIT - veja o arquivo LICENSE para detalhes.
English
About
Professional anomaly detection system integrating four algorithm families within an object-oriented architecture with an abstract class hierarchy (BaseDetector). Includes classical statistical detectors (Z-Score, Modified Z-Score/MAD, Grubbs, IQR), ML-based detectors (Isolation Forest, LOF, One-Class SVM, DBSCAN), temporal detectors (STL seasonal decomposition, CUSUM, exponential smoothing) and a configurable ensemble with voting, averaging and stacking strategies. Complemented by a robust feature extractor that generates rolling statistics, lag features, FFT components and differences to feed the detectors.
Technologies
| Technology | Version | Role |
|---|---|---|
| Python | 3.9+ | Core language |
| NumPy | >= 1.24.0 | Vectorized computation |
| Pandas | >= 2.0.0 | Feature engineering |
| SciPy | >= 1.11.0 | Statistical distributions (Grubbs, z-scores) |
| scikit-learn | >= 1.3.0 | Isolation Forest, LOF, SVM, DBSCAN, Scaler |
| pytest | >= 7.4.0 | Test suite |
| Docker | - | Containerization |
Architecture
graph TD
subgraph Detectors["Anomaly Detectors"]
subgraph Stat["Statistical"]
ZS["Z-Score"]
MZ["Modified Z-Score<br/>(MAD)"]
GR["Grubbs Test"]
IQ["IQR Detector"]
end
subgraph ML["Machine Learning"]
IF["Isolation Forest"]
LO["LOF"]
SVM["One-Class SVM"]
DB["DBSCAN"]
end
subgraph TS["Time Series"]
SD["Seasonal<br/>Decomposition (STL)"]
CU["CUSUM"]
ES["Exponential<br/>Smoothing"]
end
end
subgraph Ensemble["Ensemble"]
EN["EnsembleDetector"]
EN --> |voting| V["Weighted Voting"]
EN --> |averaging| AV["Normalized Averaging"]
EN --> |stacking| ST["Meta-Learner<br/>(LogisticRegression)"]
end
subgraph Features["Feature Engineering"]
FE["FeatureExtractor"]
FE --> RO["Rolling Stats"]
FE --> LA["Lag Features"]
FE --> FF["FFT Spectral"]
FE --> DI["Diff/Rate-of-Change"]
end
BD["BaseDetector<br/>(ABC)"]
BD --> ZS & MZ & GR & IQ
BD --> IF & LO & SVM & DB
BD --> SD & CU & ES
BD --> EN
FE -.-> DetectorsProcessing Flow
sequenceDiagram
participant D as Raw Data
participant FE as FeatureExtractor
participant DET as Detector(s)
participant ENS as Ensemble
participant R as Result
D->>FE: Time series / tabular
FE->>FE: Rolling stats + Lags + FFT + Diff
FE->>DET: Feature matrix
DET->>DET: fit(X_train)
DET->>DET: predict(X_test)
DET-->>R: labels (0=normal, 1=anomaly)
Note over ENS: Optional combination
DET->>ENS: scores from N detectors
ENS->>ENS: voting / averaging / stacking
ENS-->>R: final labelsProject Structure
anomaly-detection-advanced/
├── src/
│ ├── __init__.py
│ ├── detectors/
│ │ ├── __init__.py
│ │ ├── statistical.py # BaseDetector + 4 detectors (411 LOC)
│ │ ├── ml_based.py # 4 ML detectors (414 LOC)
│ │ ├── ensemble.py # EnsembleDetector (216 LOC)
│ │ └── timeseries.py # 3 temporal detectors (412 LOC)
│ ├── features/
│ │ ├── __init__.py
│ │ └── feature_extractor.py # FeatureExtractor (318 LOC)
│ ├── data/
│ │ └── __init__.py
│ ├── evaluation/
│ │ └── __init__.py
│ └── pipeline/
│ └── __init__.py
├── tests/
│ └── test_models.py # Unit tests (100 LOC)
├── assets/
├── config/
├── data/
├── docs/
├── notebooks/
├── .gitignore
├── Dockerfile
├── LICENSE # MIT
├── README.md
├── pytest.ini
├── requirements.txt
└── setup.py
Quick Start
# Clone repository
git clone https://github.com/galafis/anomaly-detection-advanced.git
cd anomaly-detection-advanced
# Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Run demo
python -c "
from src.detectors.statistical import ZScoreDetector
import numpy as np
data = np.concatenate([np.random.randn(100), [10, -8, 12]])
det = ZScoreDetector(threshold=3.0)
det.fit(data)
labels = det.predict(data)
print(f'Anomalies detected: {labels.sum()} out of {len(data)} samples')
"Docker
docker build -t anomaly-detection .
docker run --rm anomaly-detectionTests
# Run tests
pytest tests/ -v
# With coverage
pytest tests/ --cov=src --cov-report=term-missingBenchmarks
| Detector | Fit (1k samples) | Predict (1k) | Type |
|---|---|---|---|
| Z-Score | < 1 ms | < 1 ms | Statistical |
| Modified Z-Score (MAD) | < 1 ms | < 1 ms | Statistical |
| IQR | < 1 ms | < 1 ms | Statistical |
| Grubbs Test | < 1 ms | < 1 ms | Statistical |
| Isolation Forest | ~20 ms | ~5 ms | ML |
| LOF | ~15 ms | ~10 ms | ML |
| One-Class SVM | ~30 ms | ~5 ms | ML |
| CUSUM | < 1 ms | < 1 ms | Temporal |
| Ensemble (3 detectors) | ~50 ms | ~15 ms | Combined |
Usage Example
from src.detectors.statistical import ZScoreDetector, IQRDetector
from src.detectors.ml_based import IsolationForestDetector
from src.detectors.ensemble import EnsembleDetector
from src.features.feature_extractor import FeatureExtractor
import numpy as np
# Data with injected anomalies
np.random.seed(42)
normal = np.random.randn(500)
anomalies = np.array([8.5, -7.2, 9.1, -8.8, 10.0])
data = np.concatenate([normal, anomalies]).reshape(-1, 1)
# Individual detector
zscore = ZScoreDetector(threshold=3.0)
labels = zscore.fit_predict(data)
print(f"Z-Score: {labels.sum()} anomalies")
# Ensemble with 3 detectors
ensemble = EnsembleDetector(
detectors=[
ZScoreDetector(threshold=3.0),
IQRDetector(factor=1.5),
IsolationForestDetector(contamination=0.01, random_state=42),
],
strategy="averaging",
)
ensemble.fit(data)
ensemble_labels = ensemble.predict(data)
print(f"Ensemble: {ensemble_labels.sum()} anomalies")Industry Applicability
| Sector | Use Case | Recommended Detector |
|---|---|---|
| Finance | Transaction fraud detection | Isolation Forest + Ensemble |
| IoT / Manufacturing | Sensor monitoring | CUSUM + Seasonal Decomposition |
| Infrastructure | Server failure detection | Z-Score + LOF |
| E-commerce | Anomalous purchase patterns | DBSCAN + IQR |
| Healthcare | Vital signs monitoring | Exponential Smoothing |
| Cybersecurity | Intrusion detection | One-Class SVM + Ensemble |
License
This project is licensed under the MIT License - see the LICENSE file for details.