LLM-as-Judge on a Budget
arXiv:2602.15481v1 Announce Type: new Abstract: LLM-as-a-judge has emerged as a cornerstone technique for evaluating large language models by leveraging LLM reasoning to score prompt-response pairs. …
Quality follows upgrading
All Articles
arXiv:2602.15481v1 Announce Type: new Abstract: LLM-as-a-judge has emerged as a cornerstone technique for evaluating large language models by leveraging LLM reasoning to score prompt-response pairs. …
arXiv:2602.15499v1 Announce Type: new Abstract: It has been shown that a neural network's Lipschitz constant can be leveraged to derive robustness guarantees, to improve generalizability …
arXiv:2602.15503v1 Announce Type: new Abstract: Stability and robustness are critical for deploying Transformers in safety-sensitive settings. A principled way to enforce such behavior is to …
arXiv:2602.15510v1 Announce Type: new Abstract: Federated Learning (FL) enables distributed training across multiple clients without centralized data sharing, while Graph Neural Networks (GNNs) model relational …
arXiv:2602.15515v1 Announce Type: new Abstract: Training against white-box deception detectors has been proposed as a way to make AI systems honest. However, such training risks …
arXiv:2602.15546v1 Announce Type: new Abstract: The ability to accurately perform counterfactual inference on time series is crucial for decision-making in fields like finance, healthcare, and …
arXiv:2602.15563v1 Announce Type: new Abstract: Quantization-aware training (QAT) is an effective method to drastically reduce the memory footprint of LLMs while keeping performance degradation at …
arXiv:2602.15571v1 Announce Type: new Abstract: Predictive coding (PC) is a biologically inspired algorithm for training neural networks that relies only on local updates, allowing parallel …
arXiv:2602.15572v1 Announce Type: new Abstract: Agent-based modelling (ABM) is a widespread approach to simulate complex systems. Advancements in computational processing and storage have facilitated the …
arXiv:2602.15586v1 Announce Type: new Abstract: This paper provides statistical guarantees on the accuracy of dynamical models learned from dependent data sequences. Specifically, we develop uniform …
arXiv:2602.15593v1 Announce Type: new Abstract: Recurrent and deep neural networks (RNNs/DNNs) are cornerstone architectures in machine learning. Remarkably, RNNs differ from DNNs only by weight …
arXiv:2602.15595v1 Announce Type: new Abstract: In this paper, we formulate the new multi-objective coverage (MOC) problem where our goal is to identify a small set …