Academic

Academic

Academic · 1 min

$S^3$: Stratified Scaling Search for Test-Time in Diffusion Language Models

arXiv:2604.06260v1 Announce Type: new Abstract: Test-time scaling investigates whether a fixed diffusion language model (DLM) can generate better outputs when given more inference compute, without …

Ahsan Bilal, Muhammad Ahmed Mohsin, Muhammad Umer, Asad Aali, Muhammad Usman Khanzada, Muhammad Usman Rafique, Zihao He, Emily Fox, Dean F. Hougen
55 views
Academic · 1 min

Does a Global Perspective Help Prune Sparse MoEs Elegantly?

arXiv:2604.06542v1 Announce Type: new Abstract: Empirical scaling laws for language models have encouraged the development of ever-larger LLMs, despite their growing computational and memory costs. …

Zeliang Zhang, Nikhil Ghosh, Jiani Liu, Bin Yu, Xiaodong Liu
48 views
Academic · 1 min

DiffuMask: Diffusion Language Model for Token-level Prompt Pruning

arXiv:2604.06627v1 Announce Type: new Abstract: In-Context Learning and Chain-of-Thought prompting improve reasoning in large language models (LLMs). These typically come at the cost of longer, …

Caleb Zheng, Jyotika Singh, Fang Tu, Weiyi Sun, Sujeeth Bharadwaj, Yassine Benajiba, Sujith Ravi, Eli Shlizerman, Dan Roth
49 views