Academic

From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents

arXiv:2603.22386v1 Announce Type: new Abstract: Large language model (LLM)-based systems are becoming increasingly popular for solving tasks by constructing executable workflows that interleave LLM calls, information retrieval, tool use, code execution, memory updates, and verification. This survey reviews recent methods for designing and optimizing such workflows, which we treat as agentic computation graphs (ACGs). We organize the literature based on when workflow structure is determined, where structure refers to which components or agents are present, how they depend on each other, and how information flows between them. This lens distinguishes static methods, which fix a reusable workflow scaffold before deployment, from dynamic methods, which select, generate, or revise the workflow for a particular run before or during execution. We further organize prior work along three dimensions: when structure is determined, what part of the workflow is optimized, and which evaluation sign

arXiv:2603.22386v1 Announce Type: new Abstract: Large language model (LLM)-based systems are becoming increasingly popular for solving tasks by constructing executable workflows that interleave LLM calls, information retrieval, tool use, code execution, memory updates, and verification. This survey reviews recent methods for designing and optimizing such workflows, which we treat as agentic computation graphs (ACGs). We organize the literature based on when workflow structure is determined, where structure refers to which components or agents are present, how they depend on each other, and how information flows between them. This lens distinguishes static methods, which fix a reusable workflow scaffold before deployment, from dynamic methods, which select, generate, or revise the workflow for a particular run before or during execution. We further organize prior work along three dimensions: when structure is determined, what part of the workflow is optimized, and which evaluation signals guide optimization (e.g., task metrics, verifier signals, preferences, or trace-derived feedback). We also distinguish reusable workflow templates, run-specific realized graphs, and execution traces, separating reusable design choices from the structures actually deployed in a given run and from realized runtime behavior. Finally, we outline a structure-aware evaluation perspective that complements downstream task metrics with graph-level properties, execution cost, robustness, and structural variation across inputs. Our goal is to provide a clear vocabulary, a unified framework for positioning new methods, a more comparable view of existing body of literature, and a more reproducible evaluation standard for future work in workflow optimizations for LLM agents.

Executive Summary

This survey paper offers a structured framework for understanding the evolution of workflow optimization in LLM agent systems, distinguishing between static and dynamic runtime graph configurations. By classifying workflow design decisions along temporal axes—when structure is determined, what is optimized, and which evaluation signals inform optimization—the authors provide a coherent taxonomy that aids comparative analysis. The distinction between reusable templates, run-specific graphs, and execution traces clarifies conceptual boundaries, while the emphasis on structure-aware evaluation introduces a novel dimension for assessing robustness, cost, and variation beyond traditional task metrics. The paper’s contribution lies in offering a unified vocabulary and evaluative standard, enabling more reproducible research and informed decision-making in LLM agent development.

Key Points

  • The paper introduces a taxonomy for workflow optimization by categorizing when workflow structure is determined (static vs. dynamic).
  • It distinguishes between reusable workflow templates, run-specific realized graphs, and execution traces as conceptual layers.
  • A structure-aware evaluation framework is proposed, complementing task metrics with graph-level properties, cost, and structural variation.

Merits

Conceptual Clarity

Provides a unified vocabulary and framework that aligns prior work under a shared lexicon, reducing ambiguity in literature review.

Evaluation Innovation

Introduces a structure-aware evaluation perspective that enhances reproducibility and depth of analysis beyond conventional metrics.

Demerits

Limited Scope

While comprehensive within its domain, the survey does not address implementation details or performance benchmarks of specific tools, limiting applicability for practitioners seeking actionable code-level guidance.

Expert Commentary

The survey represents a significant step toward institutionalizing methodological rigor in LLM agent workflow optimization. Historically, the field has suffered from fragmented terminology and evaluative inconsistency, which this work directly confronts. By foregrounding the temporal dimension of workflow structure—specifically, the decision point between pre-deployment scaffolding and runtime adaptation—the authors align with emerging trends in agentic computation that prioritize adaptability over rigid architecture. Moreover, the elevation of execution traces from mere artifacts to evaluative signals marks a paradigm shift: it transforms workflow optimization from a static design problem into a dynamic, feedback-driven process. This aligns with broader computational paradigms in AI, such as reinforcement learning and adaptive systems, suggesting potential cross-disciplinary applicability. The authors’ restraint in avoiding technical prescriptive advice—while instead offering a neutral, descriptive framework—enhances the paper’s utility as a reference point for diverse stakeholders, from industry adopters to academic theorists.

Recommendations

  • Adopt the proposed taxonomy as a baseline for reporting workflow design in LLM agent literature to promote comparability.
  • Develop open-source templates or benchmark suites that operationalize the structure-aware evaluation metrics for empirical validation.

Sources

Original: arXiv - cs.AI