Academic

Academic

Academic · 1 min

Self-Distillation for Multi-Token Prediction

arXiv:2603.23911v1 Announce Type: new Abstract: As Large Language Models (LLMs) scale up, inference efficiency becomes a critical bottleneck. Multi-Token Prediction (MTP) could accelerate LLM inference …

Guoliang Zhao, Ruobing Xie, An Wang, Shuaipeng Li, Huaibing Xie, Xingwu Sun
16 views
Academic · 1 min

Argument Mining as a Text-to-Text Generation Task

arXiv:2603.23949v1 Announce Type: new Abstract: Argument Mining(AM) aims to uncover the argumentative structures within a text. Previous methods require several subtasks, such as span identification, …

Masayuki Kawarada, Tsutomu Hirao, Wataru Uchida, Masaaki Nagata
14 views
Academic · 1 min

Sparse Growing Transformer: Training-Time Sparse Depth Allocation via Progressive Attention Looping

arXiv:2603.23998v1 Announce Type: new Abstract: Existing approaches to increasing the effective depth of Transformers predominantly rely on parameter reuse, extending computation through recursive execution. Under …

Yao Chen, Yilong Chen, Yinqi Yang, Junyuan Shang, Zhenyu Zhang, Zefeng Zhang, Shuaiyi Nie, Shuohuan Wang, Yu Sun, Hua Wu, HaiFeng Wang, Tingwen Liu
11 views
Academic · 1 min

Implicit Turn-Wise Policy Optimization for Proactive User-LLM Interaction

arXiv:2603.23550v1 Announce Type: new Abstract: Multi-turn human-AI collaboration is fundamental to deploying interactive services such as adaptive tutoring, conversational recommendation, and professional consultation. However, optimizing …

Haoyu Wang, Yuxin Chen, Liang Luo, Buyun Zhang, Ellie Dingqiao Wen, Pan Li
26 views