An experimental study of KV cache reuse strategies in chunk-level caching systems
arXiv:2603.20218v1 Announce Type: new Abstract: Retrieval-augmented generation improves large language models' accuracy by adding relevant retrieved text to the prompt. Chunk level caching (CLC) accelerates …
Samuel Cestola, Tianxiang Xia, Zheng Weiyan, Zheng Pengfei, Diego Didona
10 views