SpecWeave: Scalable, Hierarchy-Aware RTL Functional Recovery using LLM

This repository provides the code and benchmarks for the SpecWeave framework, which uses Large Language Models (LLMs) to automatically generate comprehensive, hierarchy-aware specification documents from Register-Transfer Level (RTL) code.

SpecWeave addresses the poor documentation status in hardware design by employing a compositional approach that bypasses the LLM context window limits, achieving high coverage (mean ≈ 85.3%) and accuracy (mean ≈ 90.8%) in functional recovery.

Repository Structure

The repository is organized into the following high-level directories:

Benchmarks/: Contains the 12 OpenCores projects used for evaluation. Each project is structured with:
- RTL/: Verilog source files (currently only Verilog is supported by the underlying Pyverilog library).
- golden_spec/: The original, human-written specification documents accompanying the project.
- generated_spec/: The specification documents generated by the SpecWeave framework.
Scripts/: Contains the Python implementation of the SpecWeave framework.
Case Study USB/: Scripts and results of the case study mentioned in the paper.

Prerequisites and Setup

Package Requirements

The core framework relies on the following key Python package:

langchain, pyverilog

API Key Configuration

Most scripts require setting your API key for the LLM service. You must set an environment variable, typically at the beginning of your main script or environment setup:

import os
# Replace with your actual key
os.environ["OPENAI_API_KEY"] = "YOUR_API_KEY_HERE"

Assumptions

The current implementation only supports Verilog code, as it relies on the underlying Pyverilog parser for Control and Data-Flow Graph (CDFG) construction and structural analysis.

Framework Flow and Execution

The SpecWeave framework proceeds in two main stages: HSG Generation (Stage I) and Spec Synthesis/Verification (Stage II).

1. Stage I: Hierarchical Specification Graph (HSG) Generation

The HSG is the foundational knowledge base, built bottom-up by summarizing module semantics under bounded context.

Required Scripts: genHSG.py, cdfg.py, class_def.py
Placement: Ensure genHSG.py, cdfg.py, and class_def.py are located in the same directory as the Verilog RTL code you are analyzing.
Execution: Run the genHSG.py script, providing the top module name of your design:
```
python genHSG.py <top_module_name>
```
Output: This will generate a pickle file containing the Hierarchical Specification Graph (HSG): <top_module_name>.pkl

2. Stage II: Specification Synthesis and Protocol Verification

This stage utilizes the Generic Recursive Reasoning Algorithm (GRRA) to query the HSG and generate the specification content.

2.1 Protocol Compliance Check (Example: USB Case Study)

This part of the flow is not fully automated; you must manually identify candidate protocols (e.g., USB, Wishbone) to check against and provide their specification documents.

Extract Protocol Claim Points:
- Setup: Create a folder named chapters/ and place relevant chapters of the protocol specification (e.g., USB 1.0) as individual PDF files. You can also create an images/ folder for images used in the document.
- Script: Open protocol_parser.py and modify the variable protocol_name = 'USB_1_0' to match the current candidate protocol.
- Execution: Run the claim point extractor:
```
python protocol_parser.py
```
- Output: A pickle file storing the extracted claim points: protocol_gpt_response_<protocol_name>.pkl
Run Protocol Check:
- Script: protocol_check.py
- Execution: This script loads the HSG (<top_module_name>.pkl) and the extracted protocol claims.
```
python protocol_check.py <top_module_name> <protocol_name>
# Example: python protocol_check.py usb_device_core USB_1_0
```
- Output: The verification result against the protocol: protocol_verification_response_{protocol_name}.pkl
Final Protocol Compliance Report (USB Example):
- Script: final_protocol_check.py
- Configuration: Modify the following variables in the script to load the appropriate intermediate results:
```
p1 = 'USB_1_0' 
p2 = 'USB_2_0' 
p  = 'USB' # Final protocol name
```
- Execution:
```
python final_protocol_check.py
```
- Output: The final compliance report for the protocol: {p}_verification_response.pkl

Note: Repeat steps 1-3 for any other protocols present in the system (e.g., Wishbone B.3 and B.4).

2.2 Table of Content (ToC) Synthesis

This uses GRRA to generate a refined, context-aware ToC for the specification document.

Script: gen_spec_section.py
Required Inputs: <top_module_name>.pkl and all *{p}_verification_response.pkl files.
Configuration:
- Modify lines 205-212 to load all generated protocol verification response files.
- Modify lines 232-244 to adjust the major section titles for the specification.

Execution:

python gen_spec_section.py <top_module_name>

Output: A JSON file containing the list of subsections under each section: "{top_module_name}_subsections.json"

2.3 Specification Document Synthesis

This is the final content generation step, where GRRA populates each subsection.

Script: gen_spec.py
Configuration: Modify lines 485-492 to load the appropriate protocol verification response files.
Execution:
```
python gen_spec.py <top_module_name>
```
Output: The complete specification in JSON format.

2.4 Export Specification

Script: export_spec.py
Execution:
```
python export_spec.py
```
Output: Converts the JSON specification to the final PDF document.

Verification and Metrics

The following scripts are used to evaluate the quality of the generated specification against the golden specification.

Golden Specification Claim Point Extraction:
- Script: spec_claim_point_parser.py (located in the Golden_Spec_Claim_Point_Extractor/ folder)
- Used to extract atomic, verifiable statements (claim points) from the human-written golden specification.
Verification Check:
- Script: spec_checker.py
- Compares the generated specification's claim points against the extracted golden claim points to measure Coverage and Accuracy.
Generate Summary Excel:
- Script: gen_excel.py
- Generates an Excel summary of the final metrics (Coverage and Accuracy) across the benchmarks.

mohammadmonjil/SpecWeave-Benchmark