AIMER: Calibration-Free Task-Agnostic MoE Pruning
arXiv:2603.18492v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) language models increase parameter capacity without proportional per-token compute, but the deployment still requires storing all experts, making …