Academic

ARROW: Augmented Replay for RObust World models

arXiv:2603.11395v1 Announce Type: new Abstract: Continual reinforcement learning challenges agents to acquire new skills while retaining previously learned ones with the goal of improving performance in both past and future tasks. Most existing approaches rely on model-free methods with replay buffers to mitigate catastrophic forgetting; however, these solutions often face significant scalability challenges due to large memory demands. Drawing inspiration from neuroscience, where the brain replays experiences to a predictive World Model rather than directly to the policy, we present ARROW (Augmented Replay for RObust World models), a model-based continual RL algorithm that extends DreamerV3 with a memory-efficient, distribution-matching replay buffer. Unlike standard fixed-size FIFO buffers, ARROW maintains two complementary buffers: a short-term buffer for recent experiences and a long-term buffer that preserves task diversity through intelligent sampling. We evaluate ARROW on two ch

arXiv:2603.11395v1 Announce Type: new Abstract: Continual reinforcement learning challenges agents to acquire new skills while retaining previously learned ones with the goal of improving performance in both past and future tasks. Most existing approaches rely on model-free methods with replay buffers to mitigate catastrophic forgetting; however, these solutions often face significant scalability challenges due to large memory demands. Drawing inspiration from neuroscience, where the brain replays experiences to a predictive World Model rather than directly to the policy, we present ARROW (Augmented Replay for RObust World models), a model-based continual RL algorithm that extends DreamerV3 with a memory-efficient, distribution-matching replay buffer. Unlike standard fixed-size FIFO buffers, ARROW maintains two complementary buffers: a short-term buffer for recent experiences and a long-term buffer that preserves task diversity through intelligent sampling. We evaluate ARROW on two challenging continual RL settings: Tasks without shared structure (Atari), and tasks with shared structure, where knowledge transfer is possible (Procgen CoinRun variants). Compared to model-free and model-based baselines with replay buffers of the same-size, ARROW demonstrates substantially less forgetting on tasks without shared structure, while maintaining comparable forward transfer. Our findings highlight the potential of model-based RL and bio-inspired approaches for continual reinforcement learning, warranting further research.

Executive Summary

The article presents ARROW, a model-based continual reinforcement learning algorithm inspired by neuroscience, which efficiently retains previously learned skills while acquiring new ones. ARROW extends DreamerV3 with a memory-efficient, distribution-matching replay buffer, comprising a short-term buffer for recent experiences and a long-term buffer that preserves task diversity. Evaluations on challenging Atari and Procgen CoinRun settings demonstrate ARROW's ability to substantially reduce forgetting in tasks without shared structure while maintaining comparable forward transfer. These findings showcase the potential of model-based RL and bio-inspired approaches for continual reinforcement learning, warranting further research. The proposed method offers a promising solution to the scalability challenges faced by existing model-free methods with replay buffers.

Key Points

  • ARROW is a model-based continual reinforcement learning algorithm inspired by neuroscience.
  • ARROW extends DreamerV3 with a memory-efficient, distribution-matching replay buffer.
  • Evaluations on Atari and Procgen CoinRun settings demonstrate ARROW's ability to reduce forgetting and maintain forward transfer.

Merits

Addressing Scalability Challenges

ARROW tackles the significant scalability challenges faced by existing model-free methods with replay buffers by introducing a memory-efficient replay buffer.

Inspiration from Neuroscience

The proposed method draws inspiration from neuroscience, where experiences are replayed to a predictive World Model rather than directly to the policy, leading to improved performance.

Demerits

Limited Evaluation Settings

The article evaluates ARROW on two challenging settings, but further research is required to establish its applicability to a broader range of scenarios.

Dependence on DreamerV3

ARROW's performance relies on the DreamerV3 algorithm, which might limit its generalizability to other model-based RL approaches.

Expert Commentary

The article presents a novel approach to addressing the scalability challenges of existing model-free methods with replay buffers in continual reinforcement learning. ARROW's use of a memory-efficient, distribution-matching replay buffer is a significant innovation that draws inspiration from neuroscience. While the article's findings are promising, further research is required to establish the generalizability of ARROW to a broader range of scenarios and its dependence on the DreamerV3 algorithm. The implications of this work are substantial, particularly in areas where continual learning is required, and it has the potential to inform policy decisions in education, healthcare, and autonomous systems.

Recommendations

  • Future research should investigate the applicability of ARROW to other model-based RL approaches and explore its extension to more complex scenarios.
  • The findings of this article highlight the importance of bio-inspired approaches in addressing the challenges of continual reinforcement learning, and further research in this area is warranted.

Sources