Academic

Demonstrations, CoT, and Prompting: A Theoretical Analysis of ICL

Xuhan Tong, Yuchen Zeng, Jiawei Zhang · March 23, 2026 · 1 min read · 6 views

#cs.LG

arXiv:2603.19611v1 Announce Type: new Abstract: In-Context Learning (ICL) enables pretrained LLMs to adapt to downstream tasks by conditioning on a small set of input-output demonstrations, without any parameter updates. Although there have been many theoretical efforts to explain how ICL works, most either rely on strong architectural or data assumptions, or fail to capture the impact of key practical factors such as demonstration selection, Chain-of-Thought (CoT) prompting, the number of demonstrations, and prompt templates. We address this gap by establishing a theoretical analysis of ICL under mild assumptions that links these design choices to generalization behavior. We derive an upper bound on the ICL test loss, showing that performance is governed by (i) the quality of selected demonstrations, quantified by Lipschitz constants of the ICL loss along paths connecting test prompts to pretraining samples, (ii) an intrinsic ICL capability of the pretrained model, and (iii) the degree of distribution shift. Within the same framework, we analyze CoT prompting as inducing a task decomposition and show that it is beneficial when demonstrations are well chosen at each substep and the resulting subtasks are easier to learn. Finally, we characterize how ICL performance sensitivity to prompt templates varies with the number of demonstrations. Together, our study shows that pretraining equips the model with the ability to generalize beyond observed tasks, while CoT enables the model to compose simpler subtasks into more complex ones, and demonstrations and instructions enable it to retrieve similar or complex tasks, including those that can be composed into more complex ones, jointly supporting generalization to unseen tasks. All theoretical insights are corroborated by experiments.

Executive Summary

This article presents a theoretical analysis of In-Context Learning (ICL), a technique that enables pre-trained large language models (LLMs) to adapt to downstream tasks using a small set of input-output demonstrations. The authors derive an upper bound on the ICL test loss, linking design choices such as demonstration selection, Chain-of-Thought (CoT) prompting, and prompt templates to generalization behavior. The study shows that pretraining equips the model with generalization abilities, while CoT enables task decomposition, and demonstrations and instructions facilitate task retrieval. Theoretical insights are corroborated by experiments. This work contributes to understanding the mechanisms behind ICL and has implications for developing more efficient and effective ICL methods.

Key Points

▸ Established a theoretical analysis of ICL under mild assumptions
▸ Derived an upper bound on the ICL test loss
▸ Linked design choices to generalization behavior

Merits

Strength

The article provides a comprehensive theoretical framework for understanding ICL, which is a crucial step in developing more efficient and effective ICL methods. The authors' use of mild assumptions makes the framework more generalizable and applicable to various scenarios.

Demerits

Limitation

The article focuses primarily on the theoretical aspects of ICL and may not provide sufficient empirical evidence to support its claims. Additionally, the experiments conducted may not be representative of real-world scenarios, which could limit the article's practical implications.

Expert Commentary

This article represents a significant contribution to the understanding of ICL, a crucial technique in the field of natural language processing. The authors' comprehensive theoretical framework provides valuable insights into the mechanisms behind ICL, which could lead to improved performance and reduced computational costs in various applications. However, the article's focus on theoretical aspects and limited empirical evidence may limit its practical implications. Nevertheless, the article's results have significant implications for policy-making and decision-making in various domains, where ICL can be leveraged to enable generalization to unseen tasks.

Recommendations

✓ Future research should focus on developing more efficient and effective ICL methods, leveraging the insights gained from this article.
✓ The article's findings should be empirically validated across various domains and scenarios to confirm their practical implications.

Sources

Original: arXiv - cs.LG

arXiv - cs.LG

Demonstrations, CoT, and Prompting: A Theoretical Analysis of ICL

AI Commentary

Executive Summary

Key Points

Merits

Strength

Demerits

Limitation

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.