Academic

QiMeng-CodeV-SVA: Training Specialized LLMs for Hardware Assertion Generation via RTL-Grounded Bidirectional Data Synthesis

arXiv:2603.14239v1 Announce Type: new Abstract: SystemVerilog Assertions (SVAs) are crucial for hardware verification. Recent studies leverage general-purpose LLMs to translate natural language properties to SVAs (NL2SVA), but they perform poorly due to limited data. We propose a data synthesis framework to tackle two challenges: the scarcity of high-quality real-world SVA corpora and the lack of reliable methods to determine NL-SVA semantic equivalence. For the former, large-scale open-source RTLs are used to guide LLMs to generate real-world SVAs; for the latter, bidirectional translation serves as a data selection method. With the synthesized data, we train CodeV-SVA, a series of SVA generation models. Notably, CodeV-SVA-14B achieves 75.8% on NL2SVA-Human and 84.0% on NL2SVA-Machine in Func.@1, matching or exceeding advanced LLMs like GPT-5 and DeepSeek-R1.

Yutong Wu, Chenrui Cao, Pengwei Jin, Di Huang, Rui Zhang, Xishan Zhang, Zidong Du, Qi Guo, Xing Hu · March 17, 2026 · 1 min read · 8 views

#cs.CL #cs.AI #cs.AR #cs.LG

Executive Summary

The article introduces QiMeng-CodeV-SVA, a novel framework that addresses the critical gap in hardware assertion generation by leveraging RTL-grounded bidirectional data synthesis to train specialized LLMs for SystemVerilog Assertion (SVA) generation. Traditional NL2SVA approaches suffer from low accuracy due to insufficient high-quality SVA corpora and inability to validate semantic equivalence. The proposed framework mitigates these issues by using large-scale open-source RTLs to ground LLM outputs in real-world hardware contexts and employs bidirectional translation to filter for semantic consistency. The resulting CodeV-SVA models, particularly CodeV-SVA-14B, demonstrate competitive performance on benchmark metrics (75.8% on NL2SVA-Human, 84.0% on NL2SVA-Machine in Func.@1), outperforming or matching state-of-the-art LLMs. This represents a significant advancement in automating hardware verification.

Key Points

▸ Development of a data synthesis framework using RTLs to generate real-world SVAs via LLMs.
▸ Use of bidirectional translation as a data selection mechanism to ensure semantic equivalence.
▸ Achievement of competitive NL2SVA performance metrics by specialized LLMs trained on synthesized data.

Merits

Innovative Data Synthesis

The framework effectively bridges the scarcity of SVA corpora by leveraging open-source RTLs as a proxy for real-world hardware verification scenarios.

Semantic Equivalence Validation

Bidirectional translation offers a novel mechanism to filter out non-equivalent NL-SVA pairs, improving reliability and accuracy.

Performance Validation

CodeV-SVA-14B’s benchmark results validate the effectiveness of the approach, showing parity or superiority over leading LLMs.

Demerits

Dependency on RTL Availability

Performance is contingent upon access to large-scale open-source RTL repositories; limited availability may restrict applicability.

Generalization Concerns

The model’s training is grounded in specific RTL datasets; broader applicability to heterogeneous or proprietary hardware designs remains untested.

Black-Box Complexity

Bidirectional translation mechanisms may introduce opacity in model interpretability, complicating debugging or customization.

Expert Commentary

This work represents a pivotal shift in the application of LLMs to domain-specific engineering tasks. The integration of RTL grounding and bidirectional translation is particularly noteworthy—it transforms LLMs from generic language models into precision tools for hardware verification. The authors demonstrate not only technical ingenuity but also a deep understanding of the practical constraints in formal verification. By aligning LLM output generation with real-world hardware constraints, they effectively elevate the reliability of automated verification. Moreover, the performance parity with advanced LLMs suggests that specialized domain adaptation can rival or surpass general-purpose models in targeted applications. This sets a new precedent for AI-assisted engineering—where domain-specific data curation becomes the cornerstone of model efficacy. The implications extend beyond hardware verification into other engineering domains where AI-generated artifacts must be grounded in physical or formal constraints.

Recommendations

✓ Encourage open-source communities to curate and annotate RTL datasets specifically for AI-assisted verification.
✓ Develop standardized benchmark suites for evaluating AI-generated SVAs across diverse hardware architectures.

Sources

arXiv - cs.CL

QiMeng-CodeV-SVA: Training Specialized LLMs for Hardware Assertion Generation via RTL-Grounded Bidirectional Data Synthesis

AI Commentary

Executive Summary

Key Points

Merits

Innovative Data Synthesis

Semantic Equivalence Validation

Performance Validation

Demerits

Dependency on RTL Availability

Generalization Concerns

Black-Box Complexity

Expert Commentary

Recommendations

Sources

Related Articles

AI-Driven Approaches to Enhancing Fairness and Identifying Algorithmic Bias in …

High resolution schemes for hyperbolic conservation laws

Robust Graph Representation Learning via Adaptive Spectral Contrast

Towards Intrinsically Calibrated Uncertainty Quantification in Industrial Data-Driven Models via …

JCG, PC

HSOLLC Co., Ltd.