Academic

Academic · 1 min

LLM Olympiad: Why Model Evaluation Needs a Sealed Exam

arXiv:2603.23292v1 Announce Type: new Abstract: Benchmarks and leaderboards are how NLP most often communicates progress, but in the LLM era they are increasingly easy to …

Jan Christian Blaise Cruz, Alham Fikri Aji

4 views Mar 25

Academic · 1 min

Online library learning in human visual puzzle solving

arXiv:2603.23244v1 Announce Type: new Abstract: When learning a novel complex task, people often form efficient reusable abstractions that simplify future work, despite uncertainty about the …

Pinzhe Zhao, Emanuele Sansone, Marta Kryven, Bonan Zhao

3 views Mar 25

Academic · 1 min

MemCollab: Cross-Agent Memory Collaboration via Contrastive Trajectory Distillation

arXiv:2603.23234v1 Announce Type: new Abstract: Large language model (LLM)-based agents rely on memory mechanisms to reuse knowledge from past problem-solving experiences. Existing approaches typically construct …

Yurui Chang, Yiran Wu, Qingyun Wu, Lu Lin

3 views Mar 25

Academic · 1 min

PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments

arXiv:2603.23231v1 Announce Type: new Abstract: Empowering large language models with long-term memory is crucial for building agents that adapt to users' evolving needs. However, prior …

Shuochen Liu, Junyi Zhu, Long Shu, Junda Lin, Yuhao Chen, Haotian Zhang, Chao Zhang, Derong Xu, Jia Li, Bo Tang, Zhiyu Li, Feiyu Xiong, Enhong Chen, Tong Xu

4 views Mar 25

Academic · 1 min

SAiW: Source-Attributable Invisible Watermarking for Proactive Deepfake Defense

arXiv:2603.23178v1 Announce Type: new Abstract: Deepfakes generated by modern generative models pose a serious threat to information integrity, digital identity, and public trust. Existing detection …

Bibek Das, Chandranath Adak, Soumi Chattopadhyay, Zahid Akhtar, Soumya Dutta

3 views Mar 25

Academic · 1 min

Describe-Then-Act: Proactive Agent Steering via Distilled Language-Action World Models

arXiv:2603.23149v1 Announce Type: new Abstract: Deploying safety-critical agents requires anticipating the consequences of actions before they are executed. While world models offer a paradigm for …

Massimiliano Pappa, Luca Romani, Valentino Sacco, Alessio Palma, St\'ephane Lathuili\`ere, Fabio Galasso, Xavier Alameda-Pineda, Indro Spinelli

4 views Mar 25

Academic · 1 min

Between Rules and Reality: On the Context Sensitivity of LLM Moral Judgment

arXiv:2603.23114v1 Announce Type: new Abstract: A human's moral decision depends heavily on the context. Yet research on LLM morality has largely studied fixed scenarios. We …

Adrian Sauter, Mona Schirmer

26 views Mar 25

Academic · 1 min

MedCausalX: Adaptive Causal Reasoning with Self-Reflection for Trustworthy Medical Vision-Language Models

arXiv:2603.23085v1 Announce Type: new Abstract: Vision-Language Models (VLMs) have enabled interpretable medical diagnosis by integrating visual perception with linguistic reasoning. Yet, existing medical chain-of-thought (CoT) …

Jianxin Lin, Chunzheng Zhu, Peter J. Kneuertz, Yunfei Bai, Yuan Xue

3 views Mar 25

Academic · 1 min

Minibal: Balanced Game-Playing Without Opponent Modeling

arXiv:2603.23059v1 Announce Type: new Abstract: Recent advances in game AI, such as AlphaZero and Ath\'enan, have achieved superhuman performance across a wide range of board …

Quentin Cohen-Solal, Tristan Cazenave

15 views Mar 25

Academic · 1 min

Can Large Language Models Reason and Optimize Under Constraints?

arXiv:2603.23004v1 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated great capabilities across diverse natural language tasks; yet their ability to solve abstraction and …

Fabien Bernier, Salah Ghamizi, Pantelis Dogoulis, Maxime Cordy

12 views Mar 25

Academic · 1 min

On the use of Aggregation Operators to improve Human Identification using Dental Records

arXiv:2603.23003v1 Announce Type: new Abstract: The comparison of dental records is a standardized technique in forensic dentistry used to speed up the identification of individuals …

Antonio D. Villegas-Yeguas, Guillermo R-Garc\'ia, Tzipi Kahana, Jorge Pinares Toledo, Esi Sharon, Oscar Iba\~nez, Oscar Cord\'on

9 views Mar 25

Academic · 1 min

JFTA-Bench: Evaluate LLM's Ability of Tracking and Analyzing Malfunctions Using Fault Trees

arXiv:2603.22978v1 Announce Type: new Abstract: In the maintenance of complex systems, fault trees are used to locate problems and provide targeted solutions. To enable fault …

Yuhui Wang, Zhixiong Yang, Ming Zhang, Shihan Dou, Zhiheng Xi, Enyu Zhou, Senjie Jin, Yujiong Shen, Dingwei Zhu, Yi Dong, Tao Gui, Qi Zhang, Xuanjing Huang

7 views Mar 25

LLM Olympiad: Why Model Evaluation Needs a Sealed Exam

Online library learning in human visual puzzle solving

MemCollab: Cross-Agent Memory Collaboration via Contrastive Trajectory Distillation

PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments

SAiW: Source-Attributable Invisible Watermarking for Proactive Deepfake Defense

Describe-Then-Act: Proactive Agent Steering via Distilled Language-Action World Models

Between Rules and Reality: On the Context Sensitivity of LLM Moral Judgment

MedCausalX: Adaptive Causal Reasoning with Self-Reflection for Trustworthy Medical Vision-Language Models

Minibal: Balanced Game-Playing Without Opponent Modeling

Can Large Language Models Reason and Optimize Under Constraints?

On the use of Aggregation Operators to improve Human Identification using Dental Records

JFTA-Bench: Evaluate LLM's Ability of Tracking and Analyzing Malfunctions Using Fault Trees

JCG, PC

HSOLLC Co., Ltd.