EvoESAP: Non-Uniform Expert Pruning for Sparse MoE
arXiv:2603.06003v1 Announce Type: new Abstract: Sparse Mixture-of-Experts (SMoE) language models achieve strong capability at low per-token compute, yet deployment remains memory- and throughput-bound because the …
Zongfang Liu, Shengkun Tang, Boyang Sun, Zhiqiang Shen, Xin Yuan
11 views