All Articles

Articles

Academic · 1 min

Attn-QAT: 4-Bit Attention With Quantization-Aware Training

arXiv:2603.00040v1 Announce Type: new Abstract: Achieving reliable 4-bit attention is a prerequisite for end-to-end FP4 computation on emerging FP4-capable GPUs, yet attention remains the main …

Peiyuan Zhang, Matthew Noto, Wenxuan Tan, Chengquan Jiang, Will Lin, Wei Zhou, Hao Zhang
25 views
Academic · 1 min

Breaking the Factorization Barrier in Diffusion Language Models

arXiv:2603.00045v1 Announce Type: new Abstract: Diffusion language models theoretically allow for efficient parallel generation but are practically hindered by the "factorization barrier": the assumption that …

Ian Li, Zilei Shao, Benjie Wang, Rose Yu, Guy Van den Broeck, Anji Liu
25 views
Academic · 1 min

Mag-Mamba: Modeling Coupled spatiotemporal Asymmetry for POI Recommendation

arXiv:2603.00053v1 Announce Type: new Abstract: Next Point-of-Interest (POI) recommendation is a critical task in location-based services, yet it faces the fundamental challenge of coupled spatiotemporal …

Zhuoxuan Li, Tangwei Ye, Jieyuan Pei, Haina Liang, Zhongyuan Lai, Zihan Liu, Yiming Wu, Qi Zhang, Liang Hu
6 views
Academic · 1 min

Expert Divergence Learning for MoE-based Language Models

arXiv:2603.00054v1 Announce Type: new Abstract: The Mixture-of-Experts (MoE) architecture is a powerful technique for scaling language models, yet it often suffers from expert homogenization, where …

Jiaang Li, Haibin Chen, Langming Liu, Yujin Yuan, Yadao Wang, Yizhen Zhang, Chengting Yu, Xin Tong, Weidong Zhang, Shilei Liu, Wenbo Su, Bo Zheng
15 views