All Articles

Articles

Academic · 1 min

Interactive Benchmarks

arXiv:2603.04737v1 Announce Type: new Abstract: Standard benchmarks have become increasingly unreliable due to saturation, subjectivity, and poor generalization. We argue that evaluating model's ability to …

Baoqing Yue, Zihan Zhu, Yifan Zhang, Jichen Feng, Hufei Yang, Mengdi Wang
5 views
Academic · 1 min

Evaluating the Search Agent in a Parallel World

arXiv:2603.04751v1 Announce Type: new Abstract: Integrating web search tools has significantly extended the capability of LLMs to address open-world, real-time, and long-tail problems. However, evaluating …

Jiawei Chen, Xintian Shen, Lihao Zheng, Lifu Mu, Haoyi Sun, Ning Mao, Hao Ma, Tao Wei, Pan Zhou, Kun Zhan
8 views
Academic · 1 min

MOOSEnger -- a Domain-Specific AI Agent for the MOOSE Ecosystem

arXiv:2603.04756v1 Announce Type: new Abstract: MOOSEnger is a tool-enabled AI agent tailored to the Multiphysics Object-Oriented Simulation Environment (MOOSE). MOOSE cases are specified in HIT …

Mengnan Li, Jason Miller, Zachary Prince, Alexander Lindsay, Cody Permann
12 views
Academic · 1 min

Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling

arXiv:2603.04791v1 Announce Type: new Abstract: We introduce Timer-S1, a strong Mixture-of-Experts (MoE) time series foundation model with 8.3B total parameters, 0.75B activated parameters for each …

Yong Liu, Xingjian Su, Shiyu Wang, Haoran Zhang, Haixuan Liu, Yuxuan Wang, Zhou Ye, Yang Xiang, Jianmin Wang, Mingsheng Long
5 views