TTQ: Activation-Aware Test-Time Quantization to Accelerate LLM Inference On The Fly
arXiv:2603.19296v1 Announce Type: new Abstract: To tackle the huge computational demand of large foundation models, activation-aware compression techniques without retraining have been introduced. However, since …
Toshiaki Koike-Akino, Jing Liu, Ye Wang
9 views