Speech-Worthy Alignment for Japanese SpeechLLMs via Direct Preference Optimization
arXiv:2603.12565v1 Announce Type: cross Abstract: SpeechLLMs typically combine ASR-trained encoders with text-based LLM backbones, leading them to inherit written-style output patterns unsuitable for text-to-speech synthesis. …
Mengjie Zhao, Lianbo Liu, Yusuke Fujita, Hao Shi, Yuan Gao, Roman Koshkin, Yui Sudo
11 views