Cross-Family Speculative Prefill: Training-Free Long-Context Compression with Small Draft Models
arXiv:2603.02631v1 Announce Type: new Abstract: Prompt length is a major bottleneck in agentic large language model (LLM) workloads, where repeated inference steps and multi-call loops …