Category

Academic

Academic · 1 min

MiCA Learns More Knowledge Than LoRA and Full Fine-Tuning

arXiv:2604.01694v1 Announce Type: new Abstract: Minor Component Adaptation (MiCA) is a novel parameter-efficient fine-tuning method for large language models that focuses on adapting underutilized subspaces …

Sten R\"udiger, Sebastian Raschka
3 views
Academic · 1 min

Coupled Query-Key Dynamics for Attention

arXiv:2604.01683v1 Announce Type: new Abstract: Standard scaled dot-product attention computes scores from static, independent projections of the input. We show that evolving queries and keys …

Barak Gahtan, Alex M. Bronstein
5 views
Academic · 1 min

Label Shift Estimation With Incremental Prior Update

arXiv:2604.01651v1 Announce Type: new Abstract: An assumption often made in supervised learning is that the training and testing sets have the same label distribution. However, …

Yunrui Zhang, Gustavo Batista, Salil S. Kanhere
1 views
Academic · 1 min

Expert-Choice Routing Enables Adaptive Computation in Diffusion Language Models

arXiv:2604.01622v1 Announce Type: new Abstract: Diffusion language models (DLMs) enable parallel, non-autoregressive text generation, yet existing DLM mixture-of-experts (MoE) models inherit token-choice (TC) routing from …

Shuibai Zhang, Caspian Zhuang, Chihan Cui, Zhihan Yang, Fred Zhangzhi Peng, Yanxin Zhang, Haoyue Bai, Zack Jia, Yang Zhou, Guanhua Chen, Ming Liu
1 views