Academic

DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use

arXiv:2603.11076v1 Announce Type: new Abstract: Recent work synthesizes agentic tasks for post-training tool-using LLMs, yet robust generalization under shifts in tasks and toolsets remains an open challenge. We trace this brittleness to insufficient diversity in synthesized tasks. Scaling diversity is difficult because training requires tasks to remain executable and verifiable, while generalization demands coverage of diverse tool types, toolset combinations, and heterogeneous tool-use patterns. We propose DIVE, an evidence-driven recipe that inverts synthesis order, executing diverse, real-world tools first and reverse-deriving tasks strictly entailed by the resulting traces, thereby providing grounding by construction. DIVE scales structural diversity along two controllable axes, tool-pool coverage and per-task toolset variety, and an Evidence Collection--Task Derivation loop further induces rich multi-step tool-use patterns across 373 tools in five domains. Training Qwen3-8B on D

Aili Chen, Chi Zhang, Junteng Liu, Jiangjie Chen, Chengyu Du, Yunji Li, Ming Zhong, Qin Wang, Zhengmao Zhu, Jiayuan Song, Ke Ji, Junxian He, Pengyu Zhao, Yanghua Xiao · March 13, 2026 · 1 min read · 11 views

#cs.AI #cs.SE

Executive Summary

The article proposes DIVE, a novel approach to scaling diversity in agentic task synthesis for generalizable tool use in large language models. By inverting the synthesis order and executing diverse real-world tools first, DIVE provides grounding by construction and scales structural diversity along two controllable axes. The results show significant improvements in out-of-distribution generalization, outperforming the strongest baseline by 68 points. The study highlights the importance of diversity scaling over quantity scaling for achieving robust generalization.

Key Points

▸ DIVE inverts the synthesis order to execute diverse real-world tools first
▸ The approach scales structural diversity along two controllable axes
▸ DIVE improves out-of-distribution generalization by 22 average points across 9 benchmarks

Merits

Effective Diversity Scaling

DIVE's approach to scaling diversity leads to significant improvements in out-of-distribution generalization

Grounding by Construction

The method provides grounding by construction, ensuring that tasks are executable and verifiable

Demerits

Limited Tool Coverage

The study only covers 373 tools in five domains, which may not be representative of all possible tool types and use cases

Computational Complexity

The approach may require significant computational resources to execute and reverse-derive tasks

Expert Commentary

The article presents a significant contribution to the field of natural language processing and artificial intelligence. The proposed DIVE approach offers a novel solution to the challenge of scaling diversity in agentic task synthesis, which is essential for achieving robust generalization in large language models. The results demonstrate the effectiveness of the approach, and the study's findings have important implications for the development of more robust and adaptable AI systems. However, further research is needed to address the limitations of the study and to explore the potential applications of the DIVE approach in various domains.

Recommendations

✓ Future studies should investigate the applicability of the DIVE approach to other domains and tasks
✓ The development of more efficient and scalable methods for executing and reverse-deriving tasks is necessary to reduce computational complexity

Sources

arXiv - cs.AI

DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use

AI Commentary

Executive Summary

Key Points

Merits

Effective Diversity Scaling

Grounding by Construction

Demerits

Limited Tool Coverage

Computational Complexity

Expert Commentary

Recommendations

Sources

Related Articles

AI-Driven Approaches to Enhancing Fairness and Identifying Algorithmic Bias in …

High resolution schemes for hyperbolic conservation laws

Robust Graph Representation Learning via Adaptive Spectral Contrast

Towards Intrinsically Calibrated Uncertainty Quantification in Industrial Data-Driven Models via …

JCG, PC

HSOLLC Co., Ltd.