The Diminishing Returns of Early-Exit Decoding in Modern LLMs
arXiv:2603.23701v1 Announce Type: new Abstract: In Large Language Model (LLM) inference, early-exit refers to stopping computation at an intermediate layer once the prediction is sufficiently …
Rui Wei, Rui Du, Hanfei Yu, Devesh Tiwari, Jian Li, Zhaozhuo Xu, Hao Wang
21 views