Academic

ARISE: Agent Reasoning with Intrinsic Skill Evolution in Hierarchical Reinforcement Learning

arXiv:2603.16060v1 Announce Type: new Abstract: The dominant paradigm for improving mathematical reasoning in language models relies on Reinforcement Learning with verifiable rewards. Yet existing methods treat each problem instance in isolation without leveraging the reusable strategies that emerge and accumulate during training. To this end, we introduce ARISE (Agent Reasoning via Intrinsic Skill Evolution), a hierarchical reinforcement learning framework, in which a shared policy operates both to manage skills at high-level and to generate responses at low-level (denoted as a Skills Manager and a Worker, respectively). The Manager maintains a tiered skill library through a dedicated skill generation rollout that performs structured summarization of successful solution traces (after execution), while employing a policy-driven selection mechanism to retrieve relevant skills to condition future rollouts (before execution). A hierarchical reward design guides the co-evolution of reason

Y
Yu Li, Rui Miao, Zhengling Qi, Tian Lan
· · 1 min read · 7 views

arXiv:2603.16060v1 Announce Type: new Abstract: The dominant paradigm for improving mathematical reasoning in language models relies on Reinforcement Learning with verifiable rewards. Yet existing methods treat each problem instance in isolation without leveraging the reusable strategies that emerge and accumulate during training. To this end, we introduce ARISE (Agent Reasoning via Intrinsic Skill Evolution), a hierarchical reinforcement learning framework, in which a shared policy operates both to manage skills at high-level and to generate responses at low-level (denoted as a Skills Manager and a Worker, respectively). The Manager maintains a tiered skill library through a dedicated skill generation rollout that performs structured summarization of successful solution traces (after execution), while employing a policy-driven selection mechanism to retrieve relevant skills to condition future rollouts (before execution). A hierarchical reward design guides the co-evolution of reasoning ability and library quality. Experiments on two base models and seven benchmarks spanning both competition mathematics and Omni-MATH show that ARISE consistently outperforms GRPO-family algorithms and memory-augmented baselines, with particularly notable gains on out-of-distribution tasks. Ablation studies confirm that each component contributes to the observed improvements and that library quality and reasoning performance improve in tandem throughout training. Code is available at \href{https://github.com/Skylanding/ARISE}{https://github.com/Skylanding/ARISE}.

Executive Summary

The article ARISE introduces a novel hierarchical reinforcement learning framework designed to enhance mathematical reasoning in language models by leveraging reusable strategies across problem instances. Unlike conventional RL methods that treat each instance in isolation, ARISE integrates a Skills Manager and a Worker within a shared policy, enabling a tiered skill library to evolve through structured summarization of successful solution traces and policy-driven retrieval. The hierarchical reward design promotes co-evolution of reasoning ability and library quality. Empirical results across seven benchmarks demonstrate consistent outperformance of GRPO-family algorithms and memory-augmented baselines, particularly on out-of-distribution tasks. Ablations validate component contributions to observed gains and the synergistic improvement of library quality and reasoning performance. Code availability enhances reproducibility and applicability.

Key Points

  • ARISE introduces a hierarchical RL framework for mathematical reasoning via intrinsic skill evolution

Merits

Novelty

ARISE innovates by leveraging accumulated strategies via a tiered skill library and structured summarization, offering a departure from isolated instance-based RL approaches.

Empirical Validation

Experiments on multiple benchmarks validate ARISE’s superiority over existing algorithms, with measurable gains on out-of-distribution tasks.

Component Synergy

Ablation studies confirm the interdependence and additive value of the Manager and Worker components.

Demerits

Complexity

The hierarchical architecture may introduce computational overhead and complexity in implementation and scalability for real-world deployment.

Generalizability

Results are currently confined to mathematical reasoning benchmarks; applicability to broader domains remains unproven.

Expert Commentary

ARISE represents a significant conceptual advance in the application of hierarchical reinforcement learning to agent-based reasoning. The integration of a Skills Manager with a Worker within a shared policy architecture aligns with emerging trends in modular AI, particularly in domain-specific reasoning tasks where adaptability and reusability are paramount. The structured rollout of solution traces for skill generation is particularly compelling—it introduces a systematic, post-hoc analysis mechanism that enables the system to evolve incrementally based on empirical success, rather than relying on static or pre-defined heuristics. Moreover, the hierarchical reward design effectively aligns incentives for both reasoning accuracy and library quality, fostering a symbiotic evolution of capabilities. While the computational cost of maintaining a dynamic skill library may pose a practical hurdle, the empirical evidence suggests the trade-off is justified. This work sets a new benchmark for evaluating hierarchical RL in reasoning domains and provides a template for future iterations that may extend beyond mathematics to general-purpose problem-solving frameworks.

Recommendations

  • Developers should explore lightweight implementations of the ARISE framework for deployment in latency-sensitive applications without compromising core functionality

Sources