EchoKV: Efficient KV Cache Compression via Similarity-Based Reconstruction
arXiv:2603.22910v1 Announce Type: new Abstract: The increasing memory demand of the Key-Value (KV) cache poses a significant bottleneck for Large Language Models (LLMs) in long-context …
Yixuan Wang, Shiyu Ji, Yijun Liu, Qingfu Zhu, Wanxiang Che
3 views