MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild
arXiv:2603.17187v1 Announce Type: new Abstract: Large language model (LLM) agents are increasingly used for complex tasks, yet deployed agents often remain static, failing to adapt as user needs evolve. This creates a tension between the need for continuous service and the necessity of updating capabilities to match shifting task distributions. On platforms like OpenClaw, which handle diverse workloads across 20+ channels, existing methods either store raw trajectories without distilling knowledge, maintain static skill libraries, or require disruptive downtime for retraining. We present MetaClaw, a continual meta-learning framework that jointly evolves a base LLM policy and a library of reusable behavioral skills. MetaClaw employs two complementary mechanisms. Skill-driven fast adaptation analyzes failure trajectories via an LLM evolver to synthesize new skills, enabling immediate improvement with zero downtime. Opportunistic policy optimization performs gradient-based updates via cl
arXiv:2603.17187v1 Announce Type: new Abstract: Large language model (LLM) agents are increasingly used for complex tasks, yet deployed agents often remain static, failing to adapt as user needs evolve. This creates a tension between the need for continuous service and the necessity of updating capabilities to match shifting task distributions. On platforms like OpenClaw, which handle diverse workloads across 20+ channels, existing methods either store raw trajectories without distilling knowledge, maintain static skill libraries, or require disruptive downtime for retraining. We present MetaClaw, a continual meta-learning framework that jointly evolves a base LLM policy and a library of reusable behavioral skills. MetaClaw employs two complementary mechanisms. Skill-driven fast adaptation analyzes failure trajectories via an LLM evolver to synthesize new skills, enabling immediate improvement with zero downtime. Opportunistic policy optimization performs gradient-based updates via cloud LoRA fine-tuning and Reinforcement Learning with a Process Reward Model (RL-PRM). This is triggered during user-inactive windows by the Opportunistic Meta-Learning Scheduler (OMLS), which monitors system inactivity and calendar data. These mechanisms are mutually reinforcing: a refined policy generates better trajectories for skill synthesis, while richer skills provide higher-quality data for policy optimization. To prevent data contamination, a versioning mechanism separates support and query data. Built on a proxy-based architecture, MetaClaw scales to production-size LLMs without local GPUs. Experiments on MetaClaw-Bench and AutoResearchClaw show that skill-driven adaptation improves accuracy by up to 32% relative. The full pipeline advances Kimi-K2.5 accuracy from 21.4% to 40.6% and increases composite robustness by 18.3%. Code is available at https://github.com/aiming-lab/MetaClaw.
Executive Summary
MetaClaw, a novel continual meta-learning framework, is presented to address the tension between continuous service and the need for updating capabilities to match shifting task distributions. The framework employs two complementary mechanisms: skill-driven fast adaptation and opportunistic policy optimization. The former analyzes failure trajectories to synthesize new skills, while the latter performs gradient-based updates via cloud LoRA fine-tuning and Reinforcement Learning with a Process Reward Model. The mechanisms are mutually reinforcing, allowing for immediate improvement with zero downtime. Experimental results show significant improvements in accuracy and composite robustness. The framework scales to production-size LLMs without local GPUs, making it a promising solution for real-world applications.
Key Points
- ▸ MetaClaw is a continual meta-learning framework that jointly evolves a base LLM policy and a library of reusable behavioral skills.
- ▸ The framework employs two complementary mechanisms: skill-driven fast adaptation and opportunistic policy optimization.
- ▸ The mechanisms are mutually reinforcing, allowing for immediate improvement with zero downtime.
Merits
Strength in Adaptability
MetaClaw's ability to adapt to shifting task distributions without requiring disruptive downtime is a significant advantage over existing methods.
Scalability
The framework's scalability to production-size LLMs without local GPUs makes it a promising solution for real-world applications.
Demerits
Complexity
The framework's complexity may make it challenging to implement and maintain, particularly for organizations without extensive AI expertise.
Dependence on Cloud Resources
The framework's reliance on cloud resources may limit its applicability in scenarios where cloud connectivity is unreliable or unavailable.
Expert Commentary
The presentation of MetaClaw is a significant contribution to the field of AI, particularly in the area of continual meta-learning. The framework's ability to adapt to shifting task distributions without requiring disruptive downtime is a major advantage over existing methods. However, the framework's complexity and dependence on cloud resources may limit its applicability in certain scenarios. Nevertheless, the results presented in the paper are promising, and further research is warranted to explore the potential of MetaClaw in real-world applications.
Recommendations
- ✓ Further research is needed to explore the potential of MetaClaw in real-world applications and to address the challenges associated with its complexity and dependence on cloud resources.
- ✓ The framework's scalability to production-size LLMs without local GPUs makes it an attractive solution for organizations with limited AI expertise, but further evaluation is needed to ensure its viability in such scenarios.