TARo: Token-level Adaptive Routing for LLM Test-time Alignment
arXiv:2603.18411v1 Announce Type: new Abstract: Large language models (LLMs) exhibit strong reasoning capabilities but typically require expensive post-training to reach high performance. Recent test-time alignment …
Arushi Rai, Qiang Zhang, Hanqing Zeng, Yunkai Zhang, Dipesh Tamboli, Xiangjun Fan, Zhuokai Zhao
11 views