FleetForge — Hardware Platform Validation Toolkit

FleetForge is a stage-aware hardware validation framework for Linux-based datacenter servers.
It models real NPI (New Product Introduction) gates and produces structured JSON evidence
for infra, hardware, and production readiness decisions.

🧪 Google Colab (Reproducible Demo)

Colab notebook used to build and validate this project:
👉 https://colab.research.google.com/drive/1J8ElWi3FAXbB2ITDPgsa536UbQJvBt7c#scrollTo=FBwL7xjStacy

🚀 Why FleetForge?

Modern infra failures are rarely single-host issues. They are platform, firmware, or rollout-level problems.

FleetForge helps answer:

Which hardware component failed?
Is this a single-host or platform-wide issue?
Is the failure acceptable in bring-up but blocking for production?

🧠 NPI Lifecycle Model

FleetForge models hardware readiness as stage-gated validation:

Bring-up Validation
Pre-production Qualification
Production Readiness
Post-deployment Verification

🔐 Safe-by-Default Design

Only safe, read-only checks run by default
Unsafe / experimental checks never run accidentally
Explicit opt-in required using flags
Supports --dry-run to preview execution

📁 Repository Structure

FleetForge/
├── docs/
├── fleetforge/
│   ├── core/
│   │   ├── policy.py                 # Stage & safety policy engine
│   │   └── runner.py                 # Stage execution logic
│   ├── checks/
│   │   ├── storage/
│   │   │   └── fio_quick.py          # Disk smoke test (unsafe)
│   │   └── network/
│   │       └── iperf_smoke.py        # NIC throughput smoke test (unsafe)
│   └── stages/
│       ├── preprod_qualification.yaml
│       └── prod_readiness.yaml
├── out/
├── runbooks/
├── fleetforge_cli.py
├── requirements.txt
└── README.md

🧪 Unsafe / Experimental Checks (Opt-in)

These checks never run by accident:

storage.fio_quick
Disk I/O smoke test (can generate load)
network.iperf_smoke
NIC throughput smoke test (requires iperf target)

They must be explicitly enabled:

--enable-exp storage.fio_quick
--enable-exp network.iperf_smoke

▶️ Usage

Dry Run (recommended)

python fleetforge_cli.py run \
  --stage preprod_qualification \
  --dry-run \
  --enable-exp storage.fio_quick \
  --enable-exp network.iperf_smoke \
  --out out/preprod.json

Full Production Readiness Run

python fleetforge_cli.py run \
  --stage prod_readiness \
  --enable-exp storage.fio_quick \
  --enable-exp network.iperf_smoke \
  --out out/prod.json

📦 Outputs

FleetForge produces machine-readable JSON artifacts:

out/preprod.json
out/prod.json

These are designed to plug directly into:

CI pipelines
Infra dashboards
Capacity & reliability reviews

📘 Runbooks

FleetForge links failures to actionable runbooks in runbooks/.

Examples:

Disk SMART / NVMe health failures
NIC speed / duplex mismatch
Throughput regressions

🔥 Philosophy

“Fail fast in bring-up.
Fail loud before production.
Never fail silently in the field.”

FleetForge enforces hardware truth before scale.

prakhardewangan2005-hash/FleetForge-Hardware-Platform-Validation