RelayCaching: Accelerating LLM Collaboration via Decoding KV Cache Reuse
arXiv:2603.13289v1 Announce Type: new Abstract: The increasing complexity of AI tasks has shifted the paradigm from monolithic models toward multi-agent large language model (LLM) systems. …
Yingsheng Geng, Yuchong Gao, Weihong Wu, Guyue Liu, Jiang Liu
7 views