Hybrid Associative Memories
arXiv:2603.22325v1 Announce Type: new Abstract: Recurrent neural networks (RNNs) and self-attention are both widely used sequence-mixing layers that maintain an internal memory. However, this memory …
Quality follows upgrading
Category
arXiv:2603.22325v1 Announce Type: new Abstract: Recurrent neural networks (RNNs) and self-attention are both widely used sequence-mixing layers that maintain an internal memory. However, this memory …
arXiv:2603.22326v1 Announce Type: new Abstract: Decision support systems are essential for maintaining grid stability in low-carbon power systems, such as wind power plants, by providing …
arXiv:2603.22328v1 Announce Type: new Abstract: Despite the strong predictive performance achieved by machine learning models across many application domains, assessing their trustworthiness through reliable estimates …
arXiv:2603.22329v1 Announce Type: new Abstract: Decoder-only language models are stateless: hidden representations are discarded after every forward pass and nothing persists across sessions. Jeong (2026a) …
arXiv:2603.22331v1 Announce Type: new Abstract: Every wildfire prediction model deployed today shares a dangerous property: none of these methods provides formal guarantees on how much …
arXiv:2603.22332v1 Announce Type: new Abstract: Data imputation is a cornerstone technique for handling missing values in real-world datasets, which are often plagued by missingness. Despite …
arXiv:2603.22333v1 Announce Type: new Abstract: State-space models (SSMs) offer efficient alternatives to attention with linear-time recurrence. Mamba2, a recent SSM-based language model, uses selective input …
arXiv:2603.22339v1 Announce Type: new Abstract: Chinchilla Approach 2 is among the most widely used methods for fitting neural scaling laws. Its parabolic approximation introduces systematic …
arXiv:2603.22343v1 Announce Type: new Abstract: Photovoltaic (PV) power forecasting in edge-enabled grids requires balancing forecasting accuracy, robustness under weather-driven distribution shifts, and strict latency constraints. …
arXiv:2603.22346v1 Announce Type: new Abstract: We isolate and empirically characterize first-mover bias -- a path-dependent concentration of feature importance caused by sequential residual fitting in …
arXiv:2603.22348v1 Announce Type: new Abstract: Online learning algorithms often faces a fundamental trilemma: balancing regret guarantees between adversarial and stochastic settings and providing baseline safety …
arXiv:2603.22352v1 Announce Type: new Abstract: Recent progress in reinforcement learning with verifiable rewards (RLVR) offers a practical path to self-improvement of language models, but existing …