vLLM Hook v0: A Plug-in for Programming Model Internals on vLLM
arXiv:2603.06588v1 Announce Type: new Abstract: Modern artificial intelligence (AI) models are deployed on inference engines to optimize runtime efficiency and resource allocation, particularly for transformer-based …
Ching-Yun Ko, Pin-Yu Chen
21 views