Academic

Training-Free Agentic AI: Probabilistic Control and Coordination in Multi-Agent LLM Systems

arXiv:2603.13256v1 Announce Type: new Abstract: Multi-agent large language model (LLM) systems enable complex, long-horizon reasoning by composing specialized agents, but practical deployment remains hindered by inefficient routing, noisy feedback, and high interaction cost. We introduce REDEREF, a lightweight and training-free controller for multi-agent LLM collaboration that improves routing efficiency during recursive delegation. REDEREF integrates (i) belief-guided delegation via Thompson sampling to prioritize agents with historically positive marginal contributions, (ii) reflection-driven re-routing using a calibrated LLM or programmatic judge, (iii) evidence-based selection rather than output averaging, and (iv) memory-aware priors to reduce cold-start inefficiency. Across multi-agent split-knowledge tasks, we show that while recursive retry alone saturates task success, belief-guided routing reduces token usage by 28%, agent calls by 17%, and time-to-success by 19% compared to

arXiv:2603.13256v1 Announce Type: new Abstract: Multi-agent large language model (LLM) systems enable complex, long-horizon reasoning by composing specialized agents, but practical deployment remains hindered by inefficient routing, noisy feedback, and high interaction cost. We introduce REDEREF, a lightweight and training-free controller for multi-agent LLM collaboration that improves routing efficiency during recursive delegation. REDEREF integrates (i) belief-guided delegation via Thompson sampling to prioritize agents with historically positive marginal contributions, (ii) reflection-driven re-routing using a calibrated LLM or programmatic judge, (iii) evidence-based selection rather than output averaging, and (iv) memory-aware priors to reduce cold-start inefficiency. Across multi-agent split-knowledge tasks, we show that while recursive retry alone saturates task success, belief-guided routing reduces token usage by 28%, agent calls by 17%, and time-to-success by 19% compared to random recursive delegation, and adapts gracefully under agent or judge degradation. These results demonstrate that simple, interpretable probabilistic control can meaningfully improve the efficiency and robustness of multi-agent LLM systems without training or fine-tuning.

Executive Summary

This article proposes a novel, training-free controller called REDEREF for multi-agent large language model (LLM) systems, aimed at improving routing efficiency and reducing interaction costs. REDEREF integrates four key components: belief-guided delegation, reflection-driven re-routing, evidence-based selection, and memory-aware priors. Experimental results demonstrate significant improvements in token usage, agent calls, and time-to-success compared to random recursive delegation. The findings suggest that probabilistic control can enhance the efficiency and robustness of multi-agent LLM systems without requiring extensive training or fine-tuning. The article's contributions and implications are substantial, particularly for applications where efficient LLM collaboration is crucial, such as in complex decision-making tasks or real-time information processing. Further research is warranted to explore the scalability and adaptability of REDEREF in diverse settings.

Key Points

  • REDEREF is a lightweight and training-free controller for multi-agent LLM collaboration
  • REDEREF improves routing efficiency and reduces interaction costs through four key components
  • Experimental results demonstrate significant improvements in token usage, agent calls, and time-to-success

Merits

Strength in Interpretable Design

The incorporation of probabilistic control and evidence-based selection allows for a transparent and explainable decision-making process.

Scalability and Adaptability

REDEREF's modular design and adaptability to agent and judge degradation make it a promising solution for diverse applications.

Demerits

Potential Overreliance on Calibration

The article's reliance on a calibrated LLM or programmatic judge for reflection-driven re-routing may lead to performance degradation in uncalibrated or dynamic environments.

Limited Evaluation on Real-World Tasks

While the experimental results are promising, further evaluation on real-world tasks and applications is necessary to assess the practicality and scalability of REDEREF.

Expert Commentary

The article's contributions to the field of multi-agent LLM systems are substantial, particularly in the development of REDEREF. However, further research is necessary to fully assess the scalability and adaptability of REDEREF in diverse settings. Additionally, the article's reliance on calibration and evaluation on real-world tasks are areas that require attention. Nevertheless, the findings and implications of this research have the potential to significantly impact various applications, from complex decision-making tasks to real-time information processing.

Recommendations

  • Future research should focus on evaluating REDEREF on real-world tasks and applications to assess its practicality and scalability.
  • Investigating the potential of REDEREF in dynamic environments and uncalibrated or uncertain settings would be beneficial to understand its limitations and areas for improvement.

Sources