SpecWeave: Scalable, Hierarchy-Aware RTL Functional Recovery using LLM
This repository provides the code and benchmarks for the SpecWeave framework, which uses Large Language Models (LLMs) to automatically generate comprehensive, hierarchy-aware specification documents from Register-Transfer Level (RTL) code.
SpecWeave addresses the poor documentation status in hardware design by employing a compositional approach that bypasses the LLM context window limits, achieving high coverage (mean ≈ 85.3%) and accuracy (mean ≈ 90.8%) in functional recovery.
Repository Structure
The repository is organized into the following high-level directories:
Benchmarks/: Contains the 12 OpenCores projects used for evaluation. Each project is structured with:RTL/: Verilog source files (currently only Verilog is supported by the underlying Pyverilog library).golden_spec/: The original, human-written specification documents accompanying the project.generated_spec/: The specification documents generated by the SpecWeave framework.
Scripts/: Contains the Python implementation of the SpecWeave framework.Case Study USB/: Scripts and results of the case study mentioned in the paper.
Prerequisites and Setup
Package Requirements
The core framework relies on the following key Python package:
langchain,pyverilog
API Key Configuration
Most scripts require setting your API key for the LLM service. You must set an environment variable, typically at the beginning of your main script or environment setup:
import os
# Replace with your actual key
os.environ["OPENAI_API_KEY"] = "YOUR_API_KEY_HERE"Assumptions
The current implementation only supports Verilog code, as it relies on the underlying Pyverilog parser for Control and Data-Flow Graph (CDFG) construction and structural analysis.
Framework Flow and Execution
The SpecWeave framework proceeds in two main stages: HSG Generation (Stage I) and Spec Synthesis/Verification (Stage II).
1. Stage I: Hierarchical Specification Graph (HSG) Generation
The HSG is the foundational knowledge base, built bottom-up by summarizing module semantics under bounded context.
-
Required Scripts:
genHSG.py,cdfg.py,class_def.py -
Placement: Ensure
genHSG.py,cdfg.py, andclass_def.pyare located in the same directory as the Verilog RTL code you are analyzing. -
Execution: Run the
genHSG.pyscript, providing the top module name of your design:python genHSG.py <top_module_name>
-
Output: This will generate a pickle file containing the Hierarchical Specification Graph (HSG):
<top_module_name>.pkl
2. Stage II: Specification Synthesis and Protocol Verification
This stage utilizes the Generic Recursive Reasoning Algorithm (GRRA) to query the HSG and generate the specification content.
2.1 Protocol Compliance Check (Example: USB Case Study)
This part of the flow is not fully automated; you must manually identify candidate protocols (e.g., USB, Wishbone) to check against and provide their specification documents.
-
Extract Protocol Claim Points:
- Setup: Create a folder named
chapters/and place relevant chapters of the protocol specification (e.g., USB 1.0) as individual PDF files. You can also create animages/folder for images used in the document. - Script: Open
protocol_parser.pyand modify the variableprotocol_name = 'USB_1_0'to match the current candidate protocol. - Execution: Run the claim point extractor:
python protocol_parser.py
- Output: A pickle file storing the extracted claim points:
protocol_gpt_response_<protocol_name>.pkl
- Setup: Create a folder named
-
Run Protocol Check:
- Script:
protocol_check.py - Execution: This script loads the HSG (
<top_module_name>.pkl) and the extracted protocol claims.python protocol_check.py <top_module_name> <protocol_name> # Example: python protocol_check.py usb_device_core USB_1_0
- Output: The verification result against the protocol:
protocol_verification_response_{protocol_name}.pkl
- Script:
-
Final Protocol Compliance Report (USB Example):
- Script:
final_protocol_check.py - Configuration: Modify the following variables in the script to load the appropriate intermediate results:
p1 = 'USB_1_0' p2 = 'USB_2_0' p = 'USB' # Final protocol name
- Execution:
python final_protocol_check.py
- Output: The final compliance report for the protocol:
{p}_verification_response.pkl
- Script:
Note: Repeat steps 1-3 for any other protocols present in the system (e.g., Wishbone B.3 and B.4).
2.2 Table of Content (ToC) Synthesis
This uses GRRA to generate a refined, context-aware ToC for the specification document.
- Script:
gen_spec_section.py - Required Inputs:
<top_module_name>.pkland all*{p}_verification_response.pklfiles. - Configuration:
- Modify lines 205-212 to load all generated protocol verification response files.
- Modify lines 232-244 to adjust the major section titles for the specification.
- Execution:
python gen_spec_section.py <top_module_name>
- Output: A JSON file containing the list of subsections under each section:
"{top_module_name}_subsections.json"
2.3 Specification Document Synthesis
This is the final content generation step, where GRRA populates each subsection.
- Script:
gen_spec.py - Configuration: Modify lines 485-492 to load the appropriate protocol verification response files.
- Execution:
python gen_spec.py <top_module_name>
- Output: The complete specification in JSON format.
2.4 Export Specification
- Script:
export_spec.py - Execution:
python export_spec.py
- Output: Converts the JSON specification to the final PDF document.
Verification and Metrics
The following scripts are used to evaluate the quality of the generated specification against the golden specification.
- Golden Specification Claim Point Extraction:
- Script:
spec_claim_point_parser.py(located in theGolden_Spec_Claim_Point_Extractor/folder) - Used to extract atomic, verifiable statements (claim points) from the human-written golden specification.
- Script:
- Verification Check:
- Script:
spec_checker.py - Compares the generated specification's claim points against the extracted golden claim points to measure Coverage and Accuracy.
- Script:
- Generate Summary Excel:
- Script:
gen_excel.py - Generates an Excel summary of the final metrics (Coverage and Accuracy) across the benchmarks.
- Script: