BWTA: Accurate and Efficient Binarized Transformer by Algorithm-Hardware Co-design
arXiv:2604.03957v1 Announce Type: new Abstract: Ultra low-bit quantization brings substantial efficiency for Transformer-based models, but the accuracy degradation and limited GPU support hinder its wide …
Yifu Ding, Xianglong Liu, Shenghao Jin, Jinyang Guo, Jiwen Lu
21 views