Expert Threshold Routing for Autoregressive Language Modeling with Dynamic Computation Allocation and Load Balancing
arXiv:2603.11535v1 Announce Type: new Abstract: Token-choice Mixture-of-Experts (TC-MoE) routes each token to a fixed number of experts, limiting dynamic computation allocation and requiring auxiliary losses …
Hanchi Sun, Yixin Liu, Yonghui Wu, Lichao Sun
15 views