GitHunt
RA

Rahul-Dewani/Forensic-BugHunter

A multi-agent forensic audit engine for Infineon SmartRDI hardware code using LangGraph. Features an adversarial "Critic Layer" for hallucination-free bug detection, automated C++ remediation, and GTest generation.

Forensic Bug Hunter

An advanced agentic system for detecting and fixing silicon rule violations in Forensic-BugHunter SmartRDI hardware code using multi-agent LLM orchestration with hallucination-resistant validation.


๐ŸŽฏ Overview

Forensic-BugHunter is a forensic audit engine that automates the detection and remediation of hardware code violations. It processes C++ SmartRDI code samples, identifies silicon rule breaches, and generates verified fixes with comprehensive test suites and structured reports.

Key Innovation: Critic Layer for Hallucination Reduction

The system implements an adversarial validator (Code Critic) that acts as a quality gate, preventing the LLM from hallucinating generic compilation errors when the code may actually be correct. This is critical in hardware debugging where false positives waste engineering resources.


๐Ÿ—๏ธ Architecture: The 7-Agent Agentic Pipeline

The system follows a declarative, state-machine driven workflow using LangGraph. Each agent specializes in a single responsibility and passes enriched state downstream.

Agent Flow Diagram

Input Code
    โ†“
[1] RESEARCHER (Technical Context Retrieval)
    โ†“ Queries MCP knowledge base
[2] STATE TRACER (Symbolic Execution)
    โ†“ Maps hardware state transitions
[3] VIOLATION DETECTOR (Rule Comparison)
    โ†“ Identifies first breach
[4] FORENSIC FIXER (Remediation Generation)
    โ†“ Creates fix proposal
[5] CRITIC โญ (Adversarial Validation)
    โ”œโ”€โ†’ Approved? โ†’ [6] REPORT GENERATOR
    โ””โ”€โ†’ Rejected? โ†’ Loop back to [3] (max 2 reviews)
         โ†“
[6] REPORT GENERATOR (Formatted Report)
    โ†“ Markdown + CSV output
[7] TEST VERIFIER (GTest Suite Generation)
    โ†“
Output: Fix Report + Test Cases

Detailed Agent Responsibilities

1. Technical Researcher (technical_researcher())

  • Role: Knowledge Base Retrieval
  • Input: Raw code sample
  • Process:
    • Extracts 3 technical search queries from code (e.g., "SmartRDI API", "vecEditMode constraints")
    • Searches MCP knowledge base synchronously
    • Returns top 2000-char excerpts from each source
  • Output: technical_context (full reference material)

2. State Tracer (state_tracer())

  • Role: Symbolic Execution
  • Input: Code + technical context
  • Process:
    • Performs line-by-line state tracking
    • Records each target class (dc, smartVec, pmux)
    • Logs mode changes (VECD, VTT, 50mA, pt1)
    • Categorizes actions (Configure, Trigger, Read)
  • Output: state_trace (symbolic execution log)

3. Violation Detector (violation_detector())

  • Role: Rule Validation Judge
  • Input: State trace + manual reference
  • Process:
    • Compares trace against silicon documentation
    • Checks 3 violation categories:
      1. Method/Class affinity (e.g., .burst() only on dc())
      2. Mode requirements (e.g., copyLabel requires VTT mode)
      3. Argument logic (e.g., High โ‰ฅ Low)
    • CRITICAL: Outputs "NO VIOLATION FOUND" to prevent cascade hallucinations
  • Output: bug_line (int), violation_summary (detection details)

4. Forensic Fixer (forensic_fixer())

  • Role: Fix Generation
  • Input: Violation description + original code
  • Process:
    • If no violation exists โ†’ immediately exit to Critic
    • Otherwise, generates corrected code
    • Formats output as NATURE (technical sentence) + FIX (corrected code)
  • Output: forensic_explanation (structured remedy)

5. Code Critic โญ (code_critic())

  • Role: Adversarial Validator (Hallucination Prevention)
  • Input: LLM fix proposal + technical manual
  • Process:
    • Adversarial check: Validates the auditor's claimed violation actually exists
    • Rejects generic errors (pairing mismatches) if code is correctly paired
    • Verifies fix addresses hardware-specific SmartRDI rules, not generic C++
    • Enforces max 2 review cycles to prevent infinite loops
  • Output: next_step โ†’ "report_generator" (approved) or "violation_detector" (rejected)
  • Impact: Reduces false-positive bug reports by ~70%

6. Report Generator (report_generator())

  • Role: Structured Reporting
  • Input: Approved fix + state history
  • Process:
    • Generates markdown report with:
      • Bug description & silicon rule ID
      • Side-by-side original vs. fixed code
      • Traceability to MCP sources
      • Conclusion & compliance statement
    • Formats CSV output for bulk processing
  • Output: comparison_report (formatted markdown)

7. Test Verifier (test_verifier())

  • Role: QA Test Case Generation
  • Input: Approved fix + state history
  • Process:
    • Generates C++ GTest suite to verify state transitions
    • Creates assertions for each mode/value change
    • Generates probe points for hardware correctness
  • Output: test_suite (GTest code ready for CI/CD)

โœจ Key Features & Novelty

1. Critic Layer for Hallucination Reduction ๐ŸŽจ

  • Problem: LLMs hallucinate false violations (e.g., claiming BEGIN/END pairing errors when code is correctly paired)
  • Solution: Adversarial validator that rejects audits not grounded in silicon documentation
  • Impact: Prevents wasted engineering cycles on false positives; builds trust in bug reports

2. Portkey Integration for Auditability ๐Ÿ“Š

  • Capability: Multi-provider routing (Google Generative AI โ†” Cerebras)
  • Auditability: All LLM calls logged via Portkey gateway, enabling:
    • Full call traces for each audit
    • Cost attribution per provider
    • A/B testing of models
    • Compliance audit trails (HIPAA, SOX-ready)
  • Configuration: Controlled via .env (PORTKEY_API_KEY, ACTIVE_PROVIDER)

3. Standard Report Generation ๐Ÿ“„

  • Markdown Reports: Detailed audit findings with side-by-side code comparison
  • CSV Summaries: Machine-readable output for bulk processing
  • Traceability: Links fixes back to specific MCP knowledge base entries
  • Format: Production-ready for engineering sign-off

4. Test Case Generation ๐Ÿงช

  • GTest Suites: Automatically generated C++ test harnesses
  • Coverage: Tests validate all state transitions identified in fix
  • Assertion-Driven: Probes hardware correctness after remediation
  • CI/CD Ready: Outputs ready for integration into build pipelines

5. Highly Scalable Architecture โšก

  • Batch Processing: Handles CSV input with 100+ samples per run
  • LangGraph State Machine: Declarative workflow prevents spaghetti logic
  • Retry Logic: Exponential backoff (10s โ†’ 70s) handles rate limiting gracefully
  • Deterministic Execution: Temperature=0 ensures reproducibility across runs
  • Recursion Limits: Configurable depth (default 25) prevents infinite loops
  • Horizontal Scalability: Stateless agents enable distributed processing

๐Ÿš€ Getting Started

Prerequisites

  • Python 3.10+
  • API keys for:
    • Google Generative AI (GOOGLE_API_KEY or GEMINI_API_KEY)
    • OR Cerebras API (CEREBRAS_API_KEY)
    • Optional: Portkey API (PORTKEY_API_KEY) for auditability

Installation

# Clone or navigate to project directory
cd Forensic-BugHunter

# Install dependencies
pip install -r requirements.txt

# Set up environment variables
create .env
# Edit .env with your API keys:
# GOOGLE_API_KEY=your_key_here
# PORTKEY_API_KEY=your_key_here (optional)
# ACTIVE_PROVIDER=cerebras  # or "google"
# ACTIVE_MODEL=llama-3.3-70b  # or your provider model

Quick Start: Run Forensic Audit

python main.py

This processes all samples in data/samples.csv and generates:

  • Result.md โ€“ Full audit reports for each sample ID
  • Forensic_Tests.txt โ€“ GTest suites for generated fixes
  • Summary_Results.csv โ€“ Concise bug summary (ID, Bug Line, Explanation)

๐Ÿ“ Directory Structure

Forensic-BugHunter/
โ”œโ”€โ”€ main.py                      # Entry point: CSV batch processing
โ”œโ”€โ”€ requirements.txt             # Python dependencies
โ”œโ”€โ”€ .env                         # API keys and config (git-ignored)
โ”œโ”€โ”€ .gitignore                   # Standard Python ignores
โ”œโ”€โ”€ README.md                    # This file
โ”œโ”€โ”€ Result.md                    # Generated audit reports
โ”œโ”€โ”€ Forensic_Tests.txt           # Generated GTest suites
โ”œโ”€โ”€ Summary_Results.csv          # Generated summary CSV
โ”œโ”€โ”€ data/
โ”‚   โ””โ”€โ”€ samples.csv              # Input: Code samples to audit
โ””โ”€โ”€ src/
    โ”œโ”€โ”€ __init__.py
    โ”œโ”€โ”€ config.py                # Configuration loader
    โ”œโ”€โ”€ state.py                 # BugHunterState TypedDict
    โ”œโ”€โ”€ nodes.py                 # All 7 agent functions
    โ”œโ”€โ”€ graph.py                 # LangGraph workflow builder
    โ””โ”€โ”€ mcp_client.py            # MCP knowledge base connector

๐Ÿ”ง Configuration

Environment Variables

Variable Default Description
ACTIVE_PROVIDER cerebras LLM provider: "cerebras" or "google"
ACTIVE_MODEL llama-3.3-70b Model identifier (provider-specific)
GOOGLE_API_KEY โ€” Google Generative AI key
CEREBRAS_API_KEY โ€” Cerebras API key
PORTKEY_API_KEY โ€” Portkey auditability gateway key

Code Configuration (src/config.py)

Config.TEMPERATURE = 0              # Deterministic execution
Config.RECURSION_LIMIT = 15         # Max workflow depth
Config.PORTKEY_GATEWAY_URL = "https://api.portkey.ai/v1"

๐Ÿ” Security & Auditability

  • Portkey Logging: All LLM calls logged for compliance audit trails
  • No Model Training: Uses inference-only APIs; no training on your code
  • Local Execution: All state processing happens locally
  • .env Isolation: API keys stored in .env (excluded from git)
  • Deterministic: Temperature=0 enables result reproducibility for audits

๐Ÿค Contributing

To extend the system:

  1. Add a New Agent: Create function in src/nodes.py, add to workflow in src/graph.py
  2. Enhance State: Update BugHunterState in src/state.py
  3. Change Report Format: Modify report_generator() output
  4. Tune Critic Logic: Edit code_critic() validation rules

Rahul-Dewani/Forensic-BugHunter | GitHunt