MoE-SpAc: Efficient MoE Inference Based on Speculative Activation Utility in Heterogeneous Edge Scenarios
arXiv:2603.09983v1 Announce Type: cross Abstract: Mixture-of-Experts (MoE) models enable scalable performance but face severe memory constraints on edge devices. Existing offloading strategies struggle with I/O …
Shuhuai Li, Jianghao Lin, Dongdong Ge, Yinyu Ye
9 views