All Articles

Articles

Academic · 1 min

Self-Distillation for Multi-Token Prediction

arXiv:2603.23911v1 Announce Type: new Abstract: As Large Language Models (LLMs) scale up, inference efficiency becomes a critical bottleneck. Multi-Token Prediction (MTP) could accelerate LLM inference …

Guoliang Zhao, Ruobing Xie, An Wang, Shuaipeng Li, Huaibing Xie, Xingwu Sun
17 views
Academic · 1 min

Argument Mining as a Text-to-Text Generation Task

arXiv:2603.23949v1 Announce Type: new Abstract: Argument Mining(AM) aims to uncover the argumentative structures within a text. Previous methods require several subtasks, such as span identification, …

Masayuki Kawarada, Tsutomu Hirao, Wataru Uchida, Masaaki Nagata
16 views
Academic · 1 min

Sparse Growing Transformer: Training-Time Sparse Depth Allocation via Progressive Attention Looping

arXiv:2603.23998v1 Announce Type: new Abstract: Existing approaches to increasing the effective depth of Transformers predominantly rely on parameter reuse, extending computation through recursive execution. Under …

Yao Chen, Yilong Chen, Yinqi Yang, Junyuan Shang, Zhenyu Zhang, Zefeng Zhang, Shuaiyi Nie, Shuohuan Wang, Yu Sun, Hua Wu, HaiFeng Wang, Tingwen Liu
11 views