Academic

LifeSim: Long-Horizon User Life Simulator for Personalized Assistant Evaluation

arXiv:2603.12152v1 Announce Type: new Abstract: The rapid advancement of large language models (LLMs) has accelerated progress toward universal AI assistants. However, existing benchmarks for personalized assistants remain misaligned with real-world user-assistant interactions, failing to capture the complexity of external contexts and users' cognitive states. To bridge this gap, we propose LifeSim, a user simulator that models user cognition through the Belief-Desire-Intention (BDI) model within physical environments for coherent life trajectories generation, and simulates intention-driven user interactive behaviors. Based on LifeSim, we introduce LifeSim-Eval, a comprehensive benchmark for multi-scenario, long-horizon personalized assistance. LifeSim-Eval covers 8 life domains and 1,200 diverse scenarios, and adopts a multi-turn interactive method to assess models' abilities to complete explicit and implicit intentions, recover user profiles, and produce high-quality responses. Unde

Feiyu Duan, Xuanjing Huang, Zhongyu Wei · March 13, 2026 · 1 min read · 8 views

#cs.CL

Executive Summary

This article presents LifeSim, a user simulator that models user cognition through the Belief-Desire-Intention (BDI) model, simulating intention-driven user interactive behaviors in physical environments. The LifeSim-Eval benchmark assesses the capabilities of large language models (LLMs) in handling implicit intention and long-term user preference modeling. The authors conduct experiments under both single-scenario and long-horizon settings, revealing significant limitations in current LLMs. The proposed framework bridges the gap between existing benchmarks and real-world user-assistant interactions, providing a comprehensive evaluation of personalized assistants. The introduction of LifeSim-Eval covers 8 life domains and 1,200 diverse scenarios, adopting a multi-turn interactive method to evaluate models' abilities. The study contributes to the advancement of universal AI assistants and has significant implications for the development of intelligent systems.

Key Points

▸ LifeSim models user cognition through the BDI model, simulating intention-driven user interactive behaviors in physical environments.
▸ LifeSim-Eval benchmark assesses the capabilities of LLMs in handling implicit intention and long-term user preference modeling.
▸ The experiments reveal significant limitations in current LLMs under both single-scenario and long-horizon settings.

Merits

Strength

The proposed framework bridges the gap between existing benchmarks and real-world user-assistant interactions, providing a comprehensive evaluation of personalized assistants.

Accuracy

The LifeSim-Eval benchmark covers 8 life domains and 1,200 diverse scenarios, adopting a multi-turn interactive method to evaluate models' abilities.

Demerits

Limitation

The study relies on the Belief-Desire-Intention (BDI) model, which may not fully capture the complexity of human cognition and behavior.

Scope

The study focuses on personalized assistants, but the applicability of the proposed framework to other domains, such as healthcare or finance, is unclear.

Expert Commentary

The article presents a significant contribution to the field of human-computer interaction and AI development. The proposed framework, LifeSim, offers a comprehensive evaluation of personalized assistants, highlighting the limitations of current LLMs in handling implicit intention and long-term user preference modeling. The study's findings have significant implications for the development of universal AI assistants and the design of intelligent systems. However, the reliance on the BDI model and the limited scope of the study are notable limitations. Nevertheless, the proposed framework has the potential to bridge the gap between existing benchmarks and real-world user-assistant interactions, providing a more accurate evaluation of personalized assistants.

Recommendations

✓ Future studies should investigate the applicability of the proposed framework to other domains, such as healthcare or finance.
✓ The development of more sophisticated models of human cognition and behavior is necessary to fully capture the complexity of human interaction with AI-powered systems.

Sources

arXiv - cs.CL

LifeSim: Long-Horizon User Life Simulator for Personalized Assistant Evaluation

AI Commentary

Executive Summary

Key Points

Merits

Strength

Accuracy

Demerits

Limitation

Scope

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs