Academic

Understanding LLM Performance Degradation in Multi-Instance Processing: The Roles of Instance Count and Context Length

arXiv:2603.22608v1 Announce Type: new Abstract: Users often rely on Large Language Models (LLMs) for processing multiple documents or performing analysis over a number of instances. For example, analysing the overall sentiment of a number of movie reviews requires an LLM to process the sentiment of each review individually in order to provide a final aggregated answer. While LLM performance on such individual tasks is generally high, there has been little research on how LLMs perform when dealing with multi-instance inputs. In this paper, we perform a comprehensive evaluation of the multi-instance processing (MIP) ability of LLMs for tasks in which they excel individually. The results show that all LLMs follow a pattern of slight performance degradation for small numbers of instances (approximately 20-100), followed by a performance collapse on larger instance counts. Crucially, our analysis shows that while context length is associated with this degradation, the number of instances h

arXiv:2603.22608v1 Announce Type: new Abstract: Users often rely on Large Language Models (LLMs) for processing multiple documents or performing analysis over a number of instances. For example, analysing the overall sentiment of a number of movie reviews requires an LLM to process the sentiment of each review individually in order to provide a final aggregated answer. While LLM performance on such individual tasks is generally high, there has been little research on how LLMs perform when dealing with multi-instance inputs. In this paper, we perform a comprehensive evaluation of the multi-instance processing (MIP) ability of LLMs for tasks in which they excel individually. The results show that all LLMs follow a pattern of slight performance degradation for small numbers of instances (approximately 20-100), followed by a performance collapse on larger instance counts. Crucially, our analysis shows that while context length is associated with this degradation, the number of instances has a stronger effect on the final results. This finding suggests that when optimising LLM performance for MIP, attention should be paid to both context length and, in particular, instance count.

Executive Summary

This article examines the performance degradation of Large Language Models (LLMs) in multi-instance processing tasks. The study reveals that LLMs experience a slight performance decline with small instance counts, followed by a significant collapse with larger instance counts. The analysis highlights the importance of both context length and instance count in optimizing LLM performance for multi-instance processing.

Key Points

  • LLMs exhibit performance degradation in multi-instance processing tasks
  • Instance count has a stronger effect on performance degradation than context length
  • Optimizing LLM performance requires consideration of both context length and instance count

Merits

Comprehensive Evaluation

The study provides a thorough examination of LLM performance in multi-instance processing tasks, shedding light on a previously underresearched area.

Demerits

Limited Generalizability

The findings may not be applicable to all LLMs or tasks, potentially limiting the study's generalizability.

Expert Commentary

The study's findings have significant implications for the development and deployment of LLMs in real-world applications. By recognizing the importance of instance count and context length in optimizing LLM performance, researchers and practitioners can take steps to mitigate performance degradation and improve the overall effectiveness of these models. Furthermore, the study's emphasis on the need for comprehensive evaluation and optimization of LLMs highlights the importance of ongoing research in this area.

Recommendations

  • Conduct further research to explore the generalizability of the findings to other LLMs and tasks
  • Develop strategies to optimize LLM performance for multi-instance processing, taking into account both context length and instance count.

Sources

Original: arXiv - cs.AI