Adaptive Decoding via Test-Time Policy Learning for Self-Improving Generation
arXiv:2603.18428v1 Announce Type: new Abstract: Decoding strategies largely determine the quality of Large Language Model (LLM) outputs, yet widely used heuristics such as greedy or …
Asmita Bhardwaj, Yuya Jeremy Ong, Eelaaf Zahid, Basel Shbita
10 views