alok-kumar8765/Data_Recovery_Using_Python
Open-source forensic imaging & file-carving suite that recovers 40+ formats (JPEG, PNG, PDF, DOCX, MP4, ZIP, WAV, etc.) from raw disks, USB sticks, SD cards or disk-images. Creates bit-for-bit copies, SHA-256/MD5/SHA-1 triple hashes, logs chain-of-custody, exports deleted entries via The Sleuth Kit and performs multi-threaded signature-based .
π΅οΈββοΈ FBI-Level Data-Recovery Toolkit (Community Edition)
π Table of Contents
- Quick Peek
- What & Why
- Real-World Use-Cases
- Architecture & Data-Flow Diagrams
- Installation
- Usage Examples
- Code Walk-Through
- Pros & Cons
- Road-Map
- Disclaimer & Legal
Quick Peek
# 1. image a suspect USB stick
sudo python recover.py /dev/sdb --image --carve -o case042/
# 2. review report
open case042/report.htmlRecovers 40+ file types, hashes everything, logs chain-of-custody, and produces a ready-to-submit CSV.
What & Why
A 100 % Pythonic, open-source subset of the forensic imaging & carving stack used by FBI/INTERPOL labs.
It does bit-for-bit imaging, deleted-file reconstruction (TSK), raw file-carving (headers/footers), and SHA-256 hashingβwithout proprietary black boxes.
| Layer | Public tool we mimic |
|---|---|
| Imaging | dd / ewfacquire |
| File-system parsing | The Sleuth Kit (fls, icat) |
| Carving | PhotoRec signatures |
| Reporting | CSV + SHA-256 |
Real-World Use-Cases
- Corporate IR: recover ransomware-deleted finance spreadsheets.
- Law-enforcement: preview USB before sending to expensive lab.
- University lab: teach forensic pipeline without β¬ 5 k licenses.
- Home user: rescue SD-card wedding photos.
- CI/CD security: scan build artifacts for leaked credentials.
Architecture & Data-Flow Diagrams
High-Level Architecture
graph TD
A([Block Device<br/>/dev/sdb]) -->|1. dd image| B[Forensic Image<br/>SHA-256 hashed]
B --> C{SleuthKit<br/>fls + icat}
B --> D[File-Carver<br/>magic bytes]
C --> E[Deleted Files<br/>report.csv]
D --> F[Carved Files<br/>/carved/]
E & F --> G[HTML Report]
DFD-Level 0 (Context)
graph LR
Investigator -->|device path| S[System]
S -->|CSV + files| Investigator
S -->|log| Evidence_Locker
DFD-Level 1 (Decomposed)
graph TD
subgraph "Imaging Module"
IM1[Read raw bytes] --> IM2[Write img] --> IM3[Hash img]
end
subgraph "File-System Module"
FS1[fls] --> FS2[inode list] --> FS3[icat export]
end
subgraph "Carving Module"
CA1[Scan chunks] --> CA2[Match sig] --> CA3[Write file]
end
IM3 --> FS1
IM3 --> CA1
Flow Diagram (CLI journey)
flowchart LR
Start --> ParseArgs{device?}
ParseArgs -->|yes| Image[Create image] --> Hash[SHA-256]
Hash --> TSK[TSK deleted] --> Carve[Raw carve] --> Report[Generate report] --> End
Installation
| OS | One-liner |
|---|---|
| Ubuntu / Debian | sudo apt install libtsk-dev foremost && pip install -r requirements.txt |
| macOS | brew install sleuthkit foremost && pip install -r requirements.txt |
| Windows | Use WSL2 β Ubuntu instructions above (native build possible but painful) |
Detailed steps
git clone https://github.com/alok-kumar8765/Data_Recovery_Using_Python.git
cd Data_Recovery_Using_Python
python -m venv venv && source venv/bin/activate
pip install -U pip wheel
pip install -r requirements.txt
sudo make install-tools # optional: copies udev rules, man pageUsage Examples
| Goal | Command |
|---|---|
| Quick deleted-file scan | sudo python recover.py /dev/sdb |
| Full imaging + carving | sudo python recover.py /dev/sdb --image --carve -o case042 |
| Re-scan existing image | python recover.py disk.img --carve |
| Windows (WSL) | python recover.py /mnt/e/disk.img --carve |
Output tree:
case042/
βββ forensic.img
βββ forensic.img.sha256
βββ report.html
βββ sleuthkit/
β βββ sleuthkit.csv
βββ carved/
βββ JPG_0000001234.jpg
βββ PDF_0000005678.pdf
Code Walk-Through
| File | Purpose |
|---|---|
recover.py |
CLI entry-point, orchestrates imaging β TSK β carving |
imager.py |
Stream copy with progress bar & SHA-256 |
tsk_wrapper.py |
Sub-process wrapper for fls, icat |
carver.py |
Multi-threaded signature scanner |
signatures.py |
40+ file headers/footers |
reporter.py |
CSV + HTML report generator |
utils.py |
Hashing, human-readable bytes, etc. |
Pros & Cons
| Pros | Cons |
|---|---|
| 100 % open-source | No GUI (CLI only) |
| Cross-platform | Cannot break strong encryption |
| Extensible signatures | No RAID-5/6 rebuild |
| Chain-of-custody logs | Mobile crypto requires extra tools |
| Free for commercial use | SSD TRIM = unrecoverable |
Road-Map
- GTK GUI for non-tech users
- Distributed GPU brute-force plug-in (Hashcat bridge)
- RAID-5 mathematic module
- Android ADB bridge for live extraction
- DFIR playbook templates (STIX export)
Disclaimer & Legal
This software is provided for lawful use on devices you own or have explicit written permission to examine.
Unauthorised access may violate the Computer Fraud and Abuse Act (US), CMA (UK), or similar laws globally.
The authors accept no liability for misuse or data loss.
π€ Contributing
PRs welcome! Please run black + flake8 and add a test case under tests/.
π License
MIT Β© 2025 Alok Kumar β see LICENSE file.