Academic

MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild

arXiv:2603.17187v1 Announce Type: new Abstract: Large language model (LLM) agents are increasingly used for complex tasks, yet deployed agents often remain static, failing to adapt as user needs evolve. This creates a tension between the need for continuous service and the necessity of updating capabilities to match shifting task distributions. On platforms like OpenClaw, which handle diverse workloads across 20+ channels, existing methods either store raw trajectories without distilling knowledge, maintain static skill libraries, or require disruptive downtime for retraining. We present MetaClaw, a continual meta-learning framework that jointly evolves a base LLM policy and a library of reusable behavioral skills. MetaClaw employs two complementary mechanisms. Skill-driven fast adaptation analyzes failure trajectories via an LLM evolver to synthesize new skills, enabling immediate improvement with zero downtime. Opportunistic policy optimization performs gradient-based updates via cl

arXiv:2603.17187v1 Announce Type: new Abstract: Large language model (LLM) agents are increasingly used for complex tasks, yet deployed agents often remain static, failing to adapt as user needs evolve. This creates a tension between the need for continuous service and the necessity of updating capabilities to match shifting task distributions. On platforms like OpenClaw, which handle diverse workloads across 20+ channels, existing methods either store raw trajectories without distilling knowledge, maintain static skill libraries, or require disruptive downtime for retraining. We present MetaClaw, a continual meta-learning framework that jointly evolves a base LLM policy and a library of reusable behavioral skills. MetaClaw employs two complementary mechanisms. Skill-driven fast adaptation analyzes failure trajectories via an LLM evolver to synthesize new skills, enabling immediate improvement with zero downtime. Opportunistic policy optimization performs gradient-based updates via cloud LoRA fine-tuning and Reinforcement Learning with a Process Reward Model (RL-PRM). This is triggered during user-inactive windows by the Opportunistic Meta-Learning Scheduler (OMLS), which monitors system inactivity and calendar data. These mechanisms are mutually reinforcing: a refined policy generates better trajectories for skill synthesis, while richer skills provide higher-quality data for policy optimization. To prevent data contamination, a versioning mechanism separates support and query data. Built on a proxy-based architecture, MetaClaw scales to production-size LLMs without local GPUs. Experiments on MetaClaw-Bench and AutoResearchClaw show that skill-driven adaptation improves accuracy by up to 32% relative. The full pipeline advances Kimi-K2.5 accuracy from 21.4% to 40.6% and increases composite robustness by 18.3%. Code is available at https://github.com/aiming-lab/MetaClaw.

Executive Summary

MetaClaw, a novel continual meta-learning framework, is presented to address the tension between continuous service and the need for updating capabilities to match shifting task distributions. The framework employs two complementary mechanisms: skill-driven fast adaptation and opportunistic policy optimization. The former analyzes failure trajectories to synthesize new skills, while the latter performs gradient-based updates via cloud LoRA fine-tuning and Reinforcement Learning with a Process Reward Model. The mechanisms are mutually reinforcing, allowing for immediate improvement with zero downtime. Experimental results show significant improvements in accuracy and composite robustness. The framework scales to production-size LLMs without local GPUs, making it a promising solution for real-world applications.

Key Points

  • MetaClaw is a continual meta-learning framework that jointly evolves a base LLM policy and a library of reusable behavioral skills.
  • The framework employs two complementary mechanisms: skill-driven fast adaptation and opportunistic policy optimization.
  • The mechanisms are mutually reinforcing, allowing for immediate improvement with zero downtime.

Merits

Strength in Adaptability

MetaClaw's ability to adapt to shifting task distributions without requiring disruptive downtime is a significant advantage over existing methods.

Scalability

The framework's scalability to production-size LLMs without local GPUs makes it a promising solution for real-world applications.

Demerits

Complexity

The framework's complexity may make it challenging to implement and maintain, particularly for organizations without extensive AI expertise.

Dependence on Cloud Resources

The framework's reliance on cloud resources may limit its applicability in scenarios where cloud connectivity is unreliable or unavailable.

Expert Commentary

The presentation of MetaClaw is a significant contribution to the field of AI, particularly in the area of continual meta-learning. The framework's ability to adapt to shifting task distributions without requiring disruptive downtime is a major advantage over existing methods. However, the framework's complexity and dependence on cloud resources may limit its applicability in certain scenarios. Nevertheless, the results presented in the paper are promising, and further research is warranted to explore the potential of MetaClaw in real-world applications.

Recommendations

  • Further research is needed to explore the potential of MetaClaw in real-world applications and to address the challenges associated with its complexity and dependence on cloud resources.
  • The framework's scalability to production-size LLMs without local GPUs makes it an attractive solution for organizations with limited AI expertise, but further evaluation is needed to ensure its viability in such scenarios.

Sources