Beyond Token Eviction: Mixed-Dimension Budget Allocation for Efficient KV Cache Compression
arXiv:2603.20616v1 Announce Type: new Abstract: Key-value (KV) caching is widely used to accelerate transformer inference, but its memory cost grows linearly with input length, limiting …
Ruijie Miao, Zhiming Wang, Wang Li, Shiwei Wu, Shufan Liu, Yanbing Jiang, Tong Yang
7 views