Academic

Orla: A Library for Serving LLM-Based Multi-Agent Systems

arXiv:2603.13605v1 Announce Type: new Abstract: We introduce Orla, a library for constructing and running LLM-based agentic systems. Modern agentic applications consist of workflows that combine multiple LLM inference steps, tool calls, and heterogeneous infrastructure. Today, developers typically build these systems by manually composing orchestration code with LLM serving engines and tool execution logic. Orla provides a general abstraction that separates request execution from workflow-level policy. It acts as a serving layer above existing LLM inference engines: developers define workflows composed of stages, while Orla manages how those stages are mapped, executed, and coordinated across models and backends. It provides agent-level control through three mechanisms: a stage mapper, which assigns each stage to an appropriate model and backend; a workflow orchestrator, which schedules stages and manages their resources and context; and a memory manager, which manages inference state

arXiv:2603.13605v1 Announce Type: new Abstract: We introduce Orla, a library for constructing and running LLM-based agentic systems. Modern agentic applications consist of workflows that combine multiple LLM inference steps, tool calls, and heterogeneous infrastructure. Today, developers typically build these systems by manually composing orchestration code with LLM serving engines and tool execution logic. Orla provides a general abstraction that separates request execution from workflow-level policy. It acts as a serving layer above existing LLM inference engines: developers define workflows composed of stages, while Orla manages how those stages are mapped, executed, and coordinated across models and backends. It provides agent-level control through three mechanisms: a stage mapper, which assigns each stage to an appropriate model and backend; a workflow orchestrator, which schedules stages and manages their resources and context; and a memory manager, which manages inference state such as the KV cache across workflow boundaries. We demonstrate Orla with a customer support workflow that exercises many of its capabilities. We evaluate Orla on two datasets, showing that stage mapping improves latency and cost compared to a single-model vLLM baseline, while workflow-level cache management reduces time-to-first-token.

Executive Summary

The article introduces Orla, a library designed to simplify the construction and execution of LLM-based multi-agent systems. Orla provides a general abstraction that separates request execution from workflow-level policy, acting as a serving layer above existing LLM inference engines. It enables developers to define workflows composed of stages, which are then managed and coordinated by Orla across models and backends. The library demonstrates improved latency and cost efficiency compared to single-model baselines and showcases its capabilities through a customer support workflow.

Key Points

  • Orla provides a general abstraction for LLM-based multi-agent systems
  • It separates request execution from workflow-level policy
  • The library includes mechanisms for stage mapping, workflow orchestration, and memory management

Merits

Improved Efficiency

Orla's stage mapping and workflow-level cache management capabilities reduce latency and cost compared to single-model baselines.

Simplified Development

The library's abstraction and automation of workflow management simplify the development process for LLM-based multi-agent systems.

Demerits

Limited Scalability

The article does not provide detailed information on Orla's scalability, which may be a concern for large-scale deployments.

Dependence on Existing Infrastructure

Orla relies on existing LLM inference engines and may be limited by their capabilities and constraints.

Expert Commentary

The introduction of Orla represents a significant step forward in the development of LLM-based multi-agent systems. By providing a general abstraction and automation of workflow management, Orla has the potential to simplify the development process and improve the efficiency of these systems. However, further research is needed to address concerns around scalability, explainability, and data privacy. As the use of LLM-based systems continues to grow, it is essential to develop frameworks that ensure transparency, accountability, and security.

Recommendations

  • Further research on Orla's scalability and performance in large-scale deployments
  • Development of regulatory frameworks that address the unique challenges and opportunities of AI-powered systems

Sources