Mask Is What DLLM Needs: A Masked Data Training Paradigm for Diffusion LLMs
arXiv:2603.15803v1 Announce Type: new Abstract: Discrete diffusion models offer global context awareness and flexible parallel generation. However, uniform random noise schedulers in standard DLLM training …
Linrui Ma, Yufei Cui, Kai Han, Yunhe Wang
7 views