QiMeng-CodeV-SVA: Training Specialized LLMs for Hardware Assertion Generation via RTL-Grounded Bidirectional Data Synthesis
arXiv:2603.14239v1 Announce Type: new Abstract: SystemVerilog Assertions (SVAs) are crucial for hardware verification. Recent studies leverage general-purpose LLMs to translate natural language properties to SVAs (NL2SVA), but they perform poorly due to limited data. We propose a data synthesis framework to tackle two challenges: the scarcity of high-quality real-world SVA corpora and the lack of reliable methods to determine NL-SVA semantic equivalence. For the former, large-scale open-source RTLs are used to guide LLMs to generate real-world SVAs; for the latter, bidirectional translation serves as a data selection method. With the synthesized data, we train CodeV-SVA, a series of SVA generation models. Notably, CodeV-SVA-14B achieves 75.8% on NL2SVA-Human and 84.0% on NL2SVA-Machine in Func.@1, matching or exceeding advanced LLMs like GPT-5 and DeepSeek-R1.
arXiv:2603.14239v1 Announce Type: new Abstract: SystemVerilog Assertions (SVAs) are crucial for hardware verification. Recent studies leverage general-purpose LLMs to translate natural language properties to SVAs (NL2SVA), but they perform poorly due to limited data. We propose a data synthesis framework to tackle two challenges: the scarcity of high-quality real-world SVA corpora and the lack of reliable methods to determine NL-SVA semantic equivalence. For the former, large-scale open-source RTLs are used to guide LLMs to generate real-world SVAs; for the latter, bidirectional translation serves as a data selection method. With the synthesized data, we train CodeV-SVA, a series of SVA generation models. Notably, CodeV-SVA-14B achieves 75.8% on NL2SVA-Human and 84.0% on NL2SVA-Machine in Func.@1, matching or exceeding advanced LLMs like GPT-5 and DeepSeek-R1.
Executive Summary
The article introduces QiMeng-CodeV-SVA, a novel framework that addresses the critical gap in hardware assertion generation by leveraging RTL-grounded bidirectional data synthesis to train specialized LLMs for SystemVerilog Assertion (SVA) generation. Traditional NL2SVA approaches suffer from low accuracy due to insufficient high-quality SVA corpora and inability to validate semantic equivalence. The proposed framework mitigates these issues by using large-scale open-source RTLs to ground LLM outputs in real-world hardware contexts and employs bidirectional translation to filter for semantic consistency. The resulting CodeV-SVA models, particularly CodeV-SVA-14B, demonstrate competitive performance on benchmark metrics (75.8% on NL2SVA-Human, 84.0% on NL2SVA-Machine in Func.@1), outperforming or matching state-of-the-art LLMs. This represents a significant advancement in automating hardware verification.
Key Points
- ▸ Development of a data synthesis framework using RTLs to generate real-world SVAs via LLMs.
- ▸ Use of bidirectional translation as a data selection mechanism to ensure semantic equivalence.
- ▸ Achievement of competitive NL2SVA performance metrics by specialized LLMs trained on synthesized data.
Merits
Innovative Data Synthesis
The framework effectively bridges the scarcity of SVA corpora by leveraging open-source RTLs as a proxy for real-world hardware verification scenarios.
Semantic Equivalence Validation
Bidirectional translation offers a novel mechanism to filter out non-equivalent NL-SVA pairs, improving reliability and accuracy.
Performance Validation
CodeV-SVA-14B’s benchmark results validate the effectiveness of the approach, showing parity or superiority over leading LLMs.
Demerits
Dependency on RTL Availability
Performance is contingent upon access to large-scale open-source RTL repositories; limited availability may restrict applicability.
Generalization Concerns
The model’s training is grounded in specific RTL datasets; broader applicability to heterogeneous or proprietary hardware designs remains untested.
Black-Box Complexity
Bidirectional translation mechanisms may introduce opacity in model interpretability, complicating debugging or customization.
Expert Commentary
This work represents a pivotal shift in the application of LLMs to domain-specific engineering tasks. The integration of RTL grounding and bidirectional translation is particularly noteworthy—it transforms LLMs from generic language models into precision tools for hardware verification. The authors demonstrate not only technical ingenuity but also a deep understanding of the practical constraints in formal verification. By aligning LLM output generation with real-world hardware constraints, they effectively elevate the reliability of automated verification. Moreover, the performance parity with advanced LLMs suggests that specialized domain adaptation can rival or surpass general-purpose models in targeted applications. This sets a new precedent for AI-assisted engineering—where domain-specific data curation becomes the cornerstone of model efficacy. The implications extend beyond hardware verification into other engineering domains where AI-generated artifacts must be grounded in physical or formal constraints.
Recommendations
- ✓ Encourage open-source communities to curate and annotate RTL datasets specifically for AI-assisted verification.
- ✓ Develop standardized benchmark suites for evaluating AI-generated SVAs across diverse hardware architectures.