Academic

SmartBench: Evaluating LLMs in Smart Homes with Anomalous Device States and Behavioral Contexts

Qingsong Zou, Zhi Yan, Zhiyao Xu, Kuofeng Gao, Jingyu Xiao, Yong Jiang · March 10, 2026 · 1 min read · 8 views

#cs.LG #cs.AI

arXiv:2603.06636v1 Announce Type: new Abstract: Due to the strong context-awareness capabilities demonstrated by large language models (LLMs), recent research has begun exploring their integration into smart home assistants to help users manage and adjust their living environments. While LLMs have been shown to effectively understand user needs and provide appropriate responses, most existing studies primarily focus on interpreting and executing user behaviors or instructions. However, a critical function of smart home assistants is the ability to detect when the home environment is in an anomalous state. This involves two key requirements: the LLM must accurately determine whether an anomalous condition is present, and provide either a clear explanation or actionable suggestions. To enhance the anomaly detection capabilities of next-generation LLM-based smart home assistants, we introduce SmartBench, which is the first smart home dataset designed for LLMs, containing both normal and anomalous device states as well as normal and anomalous device state transition contexts. We evaluate 13 mainstream LLMs on this benchmark. The experimental results show that most state-of-the-art models cannot achieve good anomaly detection performance. For example, Claude-Sonnet-4.5 achieves only 66.1% detection accuracy on context-independent anomaly categories, and performs even worse on context-dependent anomalies, with an accuracy of only 57.8%. More experimental results suggest that next-generation LLM-based smart home assistants are still far from being able to effectively detect and handle anomalous conditions in the smart home environment. Our dataset is publicly available at https://github.com/horizonsinzqs/SmartBench.

Executive Summary

This article introduces SmartBench, a novel dataset designed for evaluating large language models (LLMs) in smart home settings with anomalous device states and behavioral contexts. The authors assess 13 mainstream LLMs on this benchmark, revealing significant limitations in their anomaly detection capabilities. The results indicate that current state-of-the-art models struggle to achieve good detection performance, particularly in context-dependent anomalies. This study underscores the need for further research and development in this area to create effective next-generation LLM-based smart home assistants.

Key Points

▸ SmartBench is the first smart home dataset designed for LLMs, addressing a critical gap in existing research.
▸ The dataset includes normal and anomalous device states, as well as normal and anomalous device state transition contexts.
▸ Most state-of-the-art LLMs performed poorly in anomaly detection tasks, highlighting a pressing need for improvement.

Merits

Strength in Research Design

The authors demonstrate a comprehensive understanding of the challenges in smart home anomaly detection and design a dataset that effectively addresses these challenges, providing a valuable resource for the research community.

Demerits

Limited Generalizability

The study focuses on a specific set of 13 mainstream LLMs, which may not be representative of the broader range of LLMs available, potentially limiting the generalizability of the findings.

Need for Larger-Scale Evaluation

The evaluation of LLMs on SmartBench is limited to a relatively small set of models, and the results may not be representative of the performance of larger-scale systems, which could be more complex and nuanced.

Expert Commentary

The introduction of SmartBench marks an important step forward in the evaluation of LLMs in smart home settings. However, the study's limitations highlight the need for more comprehensive and nuanced assessments of LLM performance in this area. Future research should aim to develop larger-scale datasets and evaluate a broader range of LLMs to better understand their capabilities and limitations in smart home anomaly detection. Additionally, the study's findings underscore the importance of developing more effective anomaly detection techniques to support the safe and secure deployment of smart home technologies.

Recommendations

✓ Future research should focus on developing larger-scale datasets and evaluating a broader range of LLMs to better understand their capabilities and limitations in smart home anomaly detection.
✓ Developing more effective anomaly detection techniques is crucial to support the safe and secure deployment of smart home technologies, and researchers should prioritize this area of investigation.

Sources

arXiv - cs.LG

SmartBench: Evaluating LLMs in Smart Homes with Anomalous Device States and Behavioral Contexts

AI Commentary

Executive Summary

Key Points

Merits

Strength in Research Design

Demerits

Limited Generalizability

Need for Larger-Scale Evaluation

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs