IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse
arXiv:2603.12201v1 Announce Type: new Abstract: Long-context agentic workflows have emerged as a defining use case for large language models, making attention efficiency critical for both …
Yushi Bai, Qian Dong, Ting Jiang, Xin Lv, Zhengxiao Du, Aohan Zeng, Jie Tang, Juanzi Li
8 views