[Re] FairDICE: A Gap Between Theory And Practice
arXiv:2603.03454v1 Announce Type: new Abstract: Offline Reinforcement Learning (RL) is an emerging field of RL in which policies are learned solely from demonstrations. Within offline …
Quality follows upgrading
All Articles
arXiv:2603.03454v1 Announce Type: new Abstract: Offline Reinforcement Learning (RL) is an emerging field of RL in which policies are learned solely from demonstrations. Within offline …
arXiv:2603.03459v1 Announce Type: new Abstract: We investigate when transformer MLP nonlinearity is actually necessary. A gate with $d+1$ parameters decides when to replace the full …
arXiv:2603.03464v1 Announce Type: new Abstract: We introduce Graph Hopfield Networks, whose energy function couples associative memory retrieval with graph Laplacian smoothing for node classification. Gradient …
arXiv:2603.03469v1 Announce Type: new Abstract: Generalization in generative modeling is defined as the ability to learn an underlying distribution from a finite dataset and produce …
arXiv:2603.03475v1 Announce Type: new Abstract: Mathematical reasoning models are widely deployed in education, automated tutoring, and decision support systems despite exhibiting fundamental computational instabilities. We …
arXiv:2603.03480v1 Announce Type: new Abstract: We study reinforcement learning with delayed state observation, where the agent observes the current state after some random number of …
arXiv:2603.03484v1 Announce Type: new Abstract: E-fuels are promising long-term energy carriers supporting the net-zero transition. However, the large combinatorial design-operation spaces under renewable uncertainty make …
arXiv:2603.03491v1 Announce Type: new Abstract: Compute-in-memory (CiM) architectures promise significant improvements in energy efficiency and throughput for deep neural network acceleration by alleviating the von …
arXiv:2603.03507v1 Announce Type: new Abstract: Adversarial attacks - input perturbations imperceptible to humans that fool neural networks - remain both a persistent failure mode in …
arXiv:2603.03511v1 Announce Type: new Abstract: We aim to learn wavefunctions simulated by time-dependent density functional theory (TDDFT), which can be efficiently represented as linear combination …
arXiv:2603.03523v1 Announce Type: new Abstract: We study reinforcement learning in infinite-horizon discounted Markov decision processes with continuous state spaces, where data are generated online from …
arXiv:2603.03524v1 Announce Type: new Abstract: As strong general reasoners, large language models (LLMs) encounter diverse domains and tasks, where the ability to adapt and self-improve …