machphy/malicious-domains
Aggregates public OSINT malicious-domain feeds, normalizes and deduplicates IOCs, and generates clean domain blocklists for SOC, SIEM, DNS filtering, and threat-intel research.
Malicious Domains – Open Threat Intelligence Feed Aggregator
This repository aggregates multiple public threat intelligence (TI) data sources into a single, normalized, and de-duplicated list of malicious, phishing, C2, and suspicious domains.
The goal is to provide a clean, ready-to-consume IOC dataset for:
- SOC & DFIR teams
- Blue-team threat hunting
- SIEM lookup enrichment
- DNS/Firewall blocking
- OSINT/CTI research
Key Features
✔ Aggregates 19+ raw feeds
✔ Extracts domains using strict regex
✔ Automatically deduplicates
✔ Deterministic sorted output (stable Git diffs)
✔ CI/CD ready feed pipeline
✔ Designed for SOC production environments
Repository Layout
malicious-domains/
├── sources/ # Raw upstream threat intel feeds
├── scripts/ # TI ingestion + normalization pipeline
│ ├── update_feeds.sh
│ └── combine_feeds.py
├── output/ # Final unified domain lists
│ ├── domains.txt
│ └── domains.csv
├── docs/ # Engineering documentation
│ ├── ARCHITECTURE.md
│ ├── DATA_MODEL.md
│ └── FEED_SOURCES.md
└── CONTRIBUTING.md
Architecture Summary
The pipeline follows a clean separation of layers:
[Raw OSINT Feeds] --> sources/
(untouched)
sources/ --> combine_feeds.py
(parse + extract + dedupe)
combine_feeds.py --> output/
(normalized artifacts)
Principles:
- Lossless ingestion (retain original data in
sources/) - Normalization only in scripts
- Idempotent runs
- Deterministic ordering
More visuals: see docs/ARCHITECTURE.md
Running the Pipeline
Update feeds (optional)
You can wire this script to cron or a GitHub Action.
./scripts/update_feeds.shThis refreshes raw .txt feed files in sources/.
NOTE: Replace placeholder URLs in the script with real feed URLs.
Combine & Normalize
python3 scripts/combine_feeds.pyOutputs generated under output/:
| File | Purpose |
|---|---|
domains.txt |
One domain per line list (ready for DNS/firewall) |
domains.csv |
CSV format with header (SIEM lookup tables, SOAR enrichment) |
Indicators Data Model
- Indicator type: Domain
- Regex-based strict extraction
- Canonical form: lower-cased domain only
- No URLs, IPs, paths, or protocols
Future metadata planned:
- source feed
- threat type (phishing/malware/c2)
- first_seen / last_seen timestamps
- confidence score
More details: docs/DATA_MODEL.md
Feed Sources
All OSINT-provider files are located in sources/.
Mapping details: docs/FEED_SOURCES.md
Practical Integration Examples
SOC / SIEM Threat Enrichment
Upload output/domains.csv as:
- A lookup table
- Dynamic blacklist
- Enrichment dataset
Use case:
-
When DNS/Proxy/Firewall logs contain a domain:
- check membership in this list
- tag as suspicious
- map to threat intelligence source
DNS Blocking (Pi-hole, Bind, Unbound)
Convert domains to hosts file format:
0.0.0.0 bad-domain.example
Example:
sed 's/^/0.0.0.0 /' output/domains.txt > output/hosts.txtUse hosts.txt as blocklist.
Firewall (Fortigate / Palo Alto)
Convert to bulk blacklist import format.
Example URL pattern:
*.malicious-domain.com
Future plan: auto-generate firewall import format.
SOAR Automation
Feed domains.csv into:
- Cortex XSOAR playbooks
- Shuffle automations
- ANY SOC custom enrichment microservice
Research & OSINT Use Cases
✔ Malicious infra trend analysis
✔ Domain age profiling
✔ Malware campaign correlation
✔ TI scoring models
✔ WhoIs intel pivoting
✔ APT/C2 infra clustering
🛠 Roadmap
-
Add automated feed ingestion via GitHub Actions
-
Export artifacts:
- STIX
- MISP JSON
- hosts file
-
Add metadata annotations:
- threat_type
- first_seen
- confidence
-
Build lookup API for realtime domain reputation:
GET /lookup?domain=xyz.com
Contributing
Contributions welcome!
Please check CONTRIBUTING.md
Disclaimer
All data are collected for:
- research
- blue-team defensive security
- SOC/Threat Intel usage only
❗ Do NOT use this dataset for any offensive or unlawful purpose.
❗ Maintainer holds no liability for misuse.