dTRPO: Trajectory Reduction in Policy Optimization of Diffusion Large Language Models
arXiv:2603.18806v1 Announce Type: new Abstract: Diffusion Large Language Models (dLLMs) introduce a new paradigm for language generation, which in turn presents new challenges for aligning …