Functional Component Ablation Reveals Specialization Patterns in Hybrid Language Model Architectures
arXiv:2603.22473v1 Announce Type: new Abstract: Hybrid language models combining attention with state space models (SSMs) or linear attention offer improved efficiency, but whether both components …