RECOVER: Robust Entity Correction via agentic Orchestration of hypothesis Variants for Evidence-based Recovery
arXiv:2603.16411v1 Announce Type: new Abstract: Entity recognition in Automatic Speech Recognition (ASR) is challenging for rare and domain-specific terms. In domains such as finance, medicine, and air traffic control, these errors are costly. If the entities are entirely absent from the ASR output, post-ASR correction becomes difficult. To address this, we introduce RECOVER, an agentic correction framework that serves as a tool-using agent. It leverages multiple hypotheses as evidence from ASR, retrieves relevant entities, and applies Large Language Model (LLM) correction under constraints. The hypotheses are used using different strategies, namely, 1-Best, Entity-Aware Select, Recognizer Output Voting Error Reduction (ROVER) Ensemble, and LLM-Select. Evaluated across five diverse datasets, it achieves 8-46% relative reductions in entity-phrase word error rate (E-WER) and increases recall by up to 22 percentage points. The LLM-Select achieves the best overall performance in entity co
arXiv:2603.16411v1 Announce Type: new Abstract: Entity recognition in Automatic Speech Recognition (ASR) is challenging for rare and domain-specific terms. In domains such as finance, medicine, and air traffic control, these errors are costly. If the entities are entirely absent from the ASR output, post-ASR correction becomes difficult. To address this, we introduce RECOVER, an agentic correction framework that serves as a tool-using agent. It leverages multiple hypotheses as evidence from ASR, retrieves relevant entities, and applies Large Language Model (LLM) correction under constraints. The hypotheses are used using different strategies, namely, 1-Best, Entity-Aware Select, Recognizer Output Voting Error Reduction (ROVER) Ensemble, and LLM-Select. Evaluated across five diverse datasets, it achieves 8-46% relative reductions in entity-phrase word error rate (E-WER) and increases recall by up to 22 percentage points. The LLM-Select achieves the best overall performance in entity correction while maintaining overall WER.
Executive Summary
The article introduces RECOVER, an agentic correction framework for entity recognition in Automatic Speech Recognition (ASR). RECOVER leverages multiple hypotheses from ASR, retrieves relevant entities, and applies Large Language Model (LLM) correction under constraints. The framework achieves significant improvements in entity-phrase word error rate (E-WER) and recall across diverse datasets. Notably, the LLM-Select strategy achieves the best overall performance in entity correction while maintaining overall word error rate (WER). The study's findings have valuable implications for domains where entity recognition errors can be costly, such as finance, medicine, and air traffic control.
Key Points
- ▸ RECOVER is an agentic correction framework for entity recognition in ASR
- ▸ The framework leverages multiple hypotheses from ASR and LLM correction under constraints
- ▸ RECOVER achieves significant improvements in E-WER and recall across diverse datasets
Merits
Robust Entity Correction
RECOVER's use of multiple hypotheses and LLM correction enables robust entity correction, particularly for rare and domain-specific terms.
Evidence-based Recovery
The framework's reliance on evidence from ASR hypotheses provides a reliable basis for entity correction and recovery.
Scalability and Flexibility
RECOVER's modular design and use of LLMs enable scalability and flexibility in entity correction across diverse domains and datasets.
Demerits
Limited Evaluation Datasets
The study's evaluation is limited to five diverse datasets, which may not fully capture the range of scenarios and domain-specificities encountered in real-world applications.
Dependence on ASR Performance
RECOVER's performance is heavily dependent on the quality of ASR hypotheses, which may be suboptimal in certain scenarios or domains.
Lack of Human Evaluation
The study does not include human evaluation of the corrected entities, which is essential for assessing the framework's overall effectiveness and accuracy.
Expert Commentary
The article presents a significant contribution to the field of entity recognition in ASR, leveraging LLMs for correction and achieving substantial improvements in E-WER and recall. While the study's limitations, such as dependence on ASR performance and lack of human evaluation, should be addressed in future work, RECOVER's modular design and evidence-based recovery approach make it a valuable tool for domain-specific applications. The study's implications for practical and policy applications are substantial, highlighting the need for more research in NLP for domain-specific applications and the importance of robust and reliable ASR systems.
Recommendations
- ✓ Future studies should evaluate RECOVER's performance on a broader range of datasets and domains, including more challenging scenarios and real-world applications.
- ✓ Investigating the use of human evaluation to assess the framework's overall effectiveness and accuracy is essential for further validation and improvement.