Academic

Retrieval-Augmented Self-Taught Reasoning Model with Adaptive Chain-of-Thought for ASR Named Entity Correction

arXiv:2602.12287v1 Announce Type: cross Abstract: End-to-end automatic speech recognition (ASR) systems frequently misrecognize domain-specific phrases like named entities, which can cause catastrophic failures in downstream tasks. A new family of named entity correction methods based on large language models (LLMs) has recently emerged. However, these approaches have yet to fully exploit the sophisticated reasoning capabilities inherent to LLMs. To bridge this gap, we propose a novel retrieval-augmented generation framework for correcting named entity errors in ASR. Our approach consists of two key components: (1) a rephrasing language model (RLM) for named entity recognition, followed by candidate retrieval using a phonetic-level edit distance; and (2) a novel self-taught reasoning model with adaptive chain-of-thought (A-STAR) that dynamically adjusts the depth of its reasoning based on task difficulty. Experiments on the AISHELL-1 and Homophone datasets demonstrate the effectivenes

arXiv:2602.12287v1 Announce Type: cross Abstract: End-to-end automatic speech recognition (ASR) systems frequently misrecognize domain-specific phrases like named entities, which can cause catastrophic failures in downstream tasks. A new family of named entity correction methods based on large language models (LLMs) has recently emerged. However, these approaches have yet to fully exploit the sophisticated reasoning capabilities inherent to LLMs. To bridge this gap, we propose a novel retrieval-augmented generation framework for correcting named entity errors in ASR. Our approach consists of two key components: (1) a rephrasing language model (RLM) for named entity recognition, followed by candidate retrieval using a phonetic-level edit distance; and (2) a novel self-taught reasoning model with adaptive chain-of-thought (A-STAR) that dynamically adjusts the depth of its reasoning based on task difficulty. Experiments on the AISHELL-1 and Homophone datasets demonstrate the effectiveness of our method, which achieves relative reductions in the named entity character error rate of 17.96\% and 34.42\%, respectively, compared to a strong baseline.

Executive Summary

The article presents a novel retrieval-augmented generation framework designed to correct named entity errors in automatic speech recognition (ASR) systems. The proposed method integrates a rephrasing language model (RLM) for named entity recognition and a self-taught reasoning model with adaptive chain-of-thought (A-STAR) that dynamically adjusts its reasoning depth based on task difficulty. Experiments conducted on the AISHELL-1 and Homophone datasets show significant improvements in named entity character error rates, with reductions of 17.96% and 34.42% respectively, compared to a strong baseline. This research highlights the potential of leveraging large language models (LLMs) for enhancing the accuracy of ASR systems in domain-specific contexts.

Key Points

  • Introduction of a retrieval-augmented generation framework for ASR named entity correction.
  • Utilization of a rephrasing language model (RLM) for named entity recognition.
  • Development of a self-taught reasoning model with adaptive chain-of-thought (A-STAR).
  • Significant improvements in named entity character error rates on benchmark datasets.

Merits

Innovative Framework

The proposed framework effectively combines retrieval-augmented generation with adaptive reasoning, addressing a critical gap in current ASR error correction methods.

Empirical Validation

The method is rigorously tested on two diverse datasets, demonstrating substantial improvements in error rates, which strengthens the credibility of the findings.

Dynamic Reasoning

The adaptive chain-of-thought approach allows the model to adjust its reasoning depth based on task difficulty, enhancing its flexibility and effectiveness.

Demerits

Limited Generalizability

While the results are promising, the study is limited to specific datasets and may not generalize to other domains or languages without further validation.

Computational Complexity

The adaptive reasoning model may introduce additional computational overhead, which could be a limitation for real-time ASR applications.

Dependency on LLMs

The effectiveness of the method is highly dependent on the performance of the underlying large language models, which may not always be reliable or accurate.

Expert Commentary

The article presents a significant advancement in the field of ASR named entity correction by introducing a retrieval-augmented generation framework that dynamically adjusts its reasoning depth. The integration of a rephrasing language model (RLM) and a self-taught reasoning model with adaptive chain-of-thought (A-STAR) demonstrates a sophisticated approach to addressing the challenges of named entity recognition errors. The empirical results on the AISHELL-1 and Homophone datasets are particularly noteworthy, showcasing substantial improvements in error rates. However, the study's limitations, such as potential computational complexity and dependency on LLMs, warrant further investigation. The practical implications of this research are far-reaching, particularly in domains where accurate ASR is crucial. Additionally, the findings contribute to the broader discourse on the applications of LLMs in NLP and speech technology, highlighting the need for continued research and development in this area.

Recommendations

  • Further validation of the proposed framework on a wider range of datasets and languages to assess its generalizability.
  • Exploration of methods to optimize the computational efficiency of the adaptive reasoning model for real-time applications.

Sources