All Articles

Articles

Academic · 1 min

Test-Time Scaling Makes Overtraining Compute-Optimal

arXiv:2604.01411v1 Announce Type: new Abstract: Modern LLMs scale at test-time, e.g. via repeated sampling, where inference cost grows with model size and the number of …

Nicholas Roberts, Sungjun Cho, Zhiqi Gao, Tzu-Heng Huang, Albert Wu, Gabriel Orlanski, Avi Trost, Kelly Buchanan, Aws Albarghouthi, Frederic Sala
10 views
Academic · 1 min

LinearARD: Linear-Memory Attention Distillation for RoPE Restoration

arXiv:2604.00004v1 Announce Type: cross Abstract: The extension of context windows in Large Language Models is typically facilitated by scaling positional encodings followed by lightweight Continual …

Ning Yang, Hengyu Zhong, Wentao Wang, Baoliang Tian, Haijun Zhang, Jun Wang
4 views
Academic · 1 min

ZEUS: Accelerating Diffusion Models with Only Second-Order Predictor

arXiv:2604.01552v1 Announce Type: new Abstract: Denoising generative models deliver high-fidelity generation but remain bottlenecked by inference latency due to the many iterative denoiser calls required …

Yixiao Wang, Ting Jiang, Zishan Shao, Hancheng Ye, Jingwei Sun, Mingyuan Ma, Jianyi Zhang, Yiran Chen, Hai Li
7 views