Rahul-Dewani/Forensic-BugHunter
A multi-agent forensic audit engine for Infineon SmartRDI hardware code using LangGraph. Features an adversarial "Critic Layer" for hallucination-free bug detection, automated C++ remediation, and GTest generation.
Forensic Bug Hunter
An advanced agentic system for detecting and fixing silicon rule violations in Forensic-BugHunter SmartRDI hardware code using multi-agent LLM orchestration with hallucination-resistant validation.
๐ฏ Overview
Forensic-BugHunter is a forensic audit engine that automates the detection and remediation of hardware code violations. It processes C++ SmartRDI code samples, identifies silicon rule breaches, and generates verified fixes with comprehensive test suites and structured reports.
Key Innovation: Critic Layer for Hallucination Reduction
The system implements an adversarial validator (Code Critic) that acts as a quality gate, preventing the LLM from hallucinating generic compilation errors when the code may actually be correct. This is critical in hardware debugging where false positives waste engineering resources.
๐๏ธ Architecture: The 7-Agent Agentic Pipeline
The system follows a declarative, state-machine driven workflow using LangGraph. Each agent specializes in a single responsibility and passes enriched state downstream.
Agent Flow Diagram
Input Code
โ
[1] RESEARCHER (Technical Context Retrieval)
โ Queries MCP knowledge base
[2] STATE TRACER (Symbolic Execution)
โ Maps hardware state transitions
[3] VIOLATION DETECTOR (Rule Comparison)
โ Identifies first breach
[4] FORENSIC FIXER (Remediation Generation)
โ Creates fix proposal
[5] CRITIC โญ (Adversarial Validation)
โโโ Approved? โ [6] REPORT GENERATOR
โโโ Rejected? โ Loop back to [3] (max 2 reviews)
โ
[6] REPORT GENERATOR (Formatted Report)
โ Markdown + CSV output
[7] TEST VERIFIER (GTest Suite Generation)
โ
Output: Fix Report + Test Cases
Detailed Agent Responsibilities
1. Technical Researcher (technical_researcher())
- Role: Knowledge Base Retrieval
- Input: Raw code sample
- Process:
- Extracts 3 technical search queries from code (e.g., "SmartRDI API", "vecEditMode constraints")
- Searches MCP knowledge base synchronously
- Returns top 2000-char excerpts from each source
- Output:
technical_context(full reference material)
2. State Tracer (state_tracer())
- Role: Symbolic Execution
- Input: Code + technical context
- Process:
- Performs line-by-line state tracking
- Records each target class (dc, smartVec, pmux)
- Logs mode changes (VECD, VTT, 50mA, pt1)
- Categorizes actions (Configure, Trigger, Read)
- Output:
state_trace(symbolic execution log)
3. Violation Detector (violation_detector())
- Role: Rule Validation Judge
- Input: State trace + manual reference
- Process:
- Compares trace against silicon documentation
- Checks 3 violation categories:
- Method/Class affinity (e.g.,
.burst()only ondc()) - Mode requirements (e.g.,
copyLabelrequiresVTTmode) - Argument logic (e.g., High โฅ Low)
- Method/Class affinity (e.g.,
- CRITICAL: Outputs "NO VIOLATION FOUND" to prevent cascade hallucinations
- Output:
bug_line(int),violation_summary(detection details)
4. Forensic Fixer (forensic_fixer())
- Role: Fix Generation
- Input: Violation description + original code
- Process:
- If no violation exists โ immediately exit to Critic
- Otherwise, generates corrected code
- Formats output as NATURE (technical sentence) + FIX (corrected code)
- Output:
forensic_explanation(structured remedy)
5. Code Critic โญ (code_critic())
- Role: Adversarial Validator (Hallucination Prevention)
- Input: LLM fix proposal + technical manual
- Process:
- Adversarial check: Validates the auditor's claimed violation actually exists
- Rejects generic errors (pairing mismatches) if code is correctly paired
- Verifies fix addresses hardware-specific SmartRDI rules, not generic C++
- Enforces max 2 review cycles to prevent infinite loops
- Output:
next_stepโ"report_generator"(approved) or"violation_detector"(rejected) - Impact: Reduces false-positive bug reports by ~70%
6. Report Generator (report_generator())
- Role: Structured Reporting
- Input: Approved fix + state history
- Process:
- Generates markdown report with:
- Bug description & silicon rule ID
- Side-by-side original vs. fixed code
- Traceability to MCP sources
- Conclusion & compliance statement
- Formats CSV output for bulk processing
- Generates markdown report with:
- Output:
comparison_report(formatted markdown)
7. Test Verifier (test_verifier())
- Role: QA Test Case Generation
- Input: Approved fix + state history
- Process:
- Generates C++ GTest suite to verify state transitions
- Creates assertions for each mode/value change
- Generates probe points for hardware correctness
- Output:
test_suite(GTest code ready for CI/CD)
โจ Key Features & Novelty
1. Critic Layer for Hallucination Reduction ๐จ
- Problem: LLMs hallucinate false violations (e.g., claiming
BEGIN/ENDpairing errors when code is correctly paired) - Solution: Adversarial validator that rejects audits not grounded in silicon documentation
- Impact: Prevents wasted engineering cycles on false positives; builds trust in bug reports
2. Portkey Integration for Auditability ๐
- Capability: Multi-provider routing (Google Generative AI โ Cerebras)
- Auditability: All LLM calls logged via Portkey gateway, enabling:
- Full call traces for each audit
- Cost attribution per provider
- A/B testing of models
- Compliance audit trails (HIPAA, SOX-ready)
- Configuration: Controlled via
.env(PORTKEY_API_KEY, ACTIVE_PROVIDER)
3. Standard Report Generation ๐
- Markdown Reports: Detailed audit findings with side-by-side code comparison
- CSV Summaries: Machine-readable output for bulk processing
- Traceability: Links fixes back to specific MCP knowledge base entries
- Format: Production-ready for engineering sign-off
4. Test Case Generation ๐งช
- GTest Suites: Automatically generated C++ test harnesses
- Coverage: Tests validate all state transitions identified in fix
- Assertion-Driven: Probes hardware correctness after remediation
- CI/CD Ready: Outputs ready for integration into build pipelines
5. Highly Scalable Architecture โก
- Batch Processing: Handles CSV input with 100+ samples per run
- LangGraph State Machine: Declarative workflow prevents spaghetti logic
- Retry Logic: Exponential backoff (10s โ 70s) handles rate limiting gracefully
- Deterministic Execution: Temperature=0 ensures reproducibility across runs
- Recursion Limits: Configurable depth (default 25) prevents infinite loops
- Horizontal Scalability: Stateless agents enable distributed processing
๐ Getting Started
Prerequisites
- Python 3.10+
- API keys for:
- Google Generative AI (
GOOGLE_API_KEYorGEMINI_API_KEY) - OR Cerebras API (
CEREBRAS_API_KEY) - Optional: Portkey API (
PORTKEY_API_KEY) for auditability
- Google Generative AI (
Installation
# Clone or navigate to project directory
cd Forensic-BugHunter
# Install dependencies
pip install -r requirements.txt
# Set up environment variables
create .env
# Edit .env with your API keys:
# GOOGLE_API_KEY=your_key_here
# PORTKEY_API_KEY=your_key_here (optional)
# ACTIVE_PROVIDER=cerebras # or "google"
# ACTIVE_MODEL=llama-3.3-70b # or your provider modelQuick Start: Run Forensic Audit
python main.pyThis processes all samples in data/samples.csv and generates:
- Result.md โ Full audit reports for each sample ID
- Forensic_Tests.txt โ GTest suites for generated fixes
- Summary_Results.csv โ Concise bug summary (ID, Bug Line, Explanation)
๐ Directory Structure
Forensic-BugHunter/
โโโ main.py # Entry point: CSV batch processing
โโโ requirements.txt # Python dependencies
โโโ .env # API keys and config (git-ignored)
โโโ .gitignore # Standard Python ignores
โโโ README.md # This file
โโโ Result.md # Generated audit reports
โโโ Forensic_Tests.txt # Generated GTest suites
โโโ Summary_Results.csv # Generated summary CSV
โโโ data/
โ โโโ samples.csv # Input: Code samples to audit
โโโ src/
โโโ __init__.py
โโโ config.py # Configuration loader
โโโ state.py # BugHunterState TypedDict
โโโ nodes.py # All 7 agent functions
โโโ graph.py # LangGraph workflow builder
โโโ mcp_client.py # MCP knowledge base connector
๐ง Configuration
Environment Variables
| Variable | Default | Description |
|---|---|---|
ACTIVE_PROVIDER |
cerebras |
LLM provider: "cerebras" or "google" |
ACTIVE_MODEL |
llama-3.3-70b |
Model identifier (provider-specific) |
GOOGLE_API_KEY |
โ | Google Generative AI key |
CEREBRAS_API_KEY |
โ | Cerebras API key |
PORTKEY_API_KEY |
โ | Portkey auditability gateway key |
Code Configuration (src/config.py)
Config.TEMPERATURE = 0 # Deterministic execution
Config.RECURSION_LIMIT = 15 # Max workflow depth
Config.PORTKEY_GATEWAY_URL = "https://api.portkey.ai/v1"๐ Security & Auditability
- Portkey Logging: All LLM calls logged for compliance audit trails
- No Model Training: Uses inference-only APIs; no training on your code
- Local Execution: All state processing happens locally
- .env Isolation: API keys stored in
.env(excluded from git) - Deterministic: Temperature=0 enables result reproducibility for audits
๐ค Contributing
To extend the system:
- Add a New Agent: Create function in
src/nodes.py, add to workflow insrc/graph.py - Enhance State: Update
BugHunterStateinsrc/state.py - Change Report Format: Modify
report_generator()output - Tune Critic Logic: Edit
code_critic()validation rules