Academic

MAC: Multi-Agent Constitution Learning

arXiv:2603.15968v1 Announce Type: new Abstract: Constitutional AI is a method to oversee and control LLMs based on a set of rules written in natural language. These rules are typically written by human experts, but could in principle be learned automatically given sufficient training data for the desired behavior. Existing LLM-based prompt optimizers attempt this but are ineffective at learning constitutions since (i) they require many labeled examples and (ii) lack structure in the optimized prompts, leading to diminishing improvements as prompt size grows. To address these limitations, we propose Multi-Agent Constitutional Learning (MAC), which optimizes over structured prompts represented as sets of rules using a network of agents with specialized tasks to accept, edit, or reject rule updates. We also present MAC+, which improves performance by training agents on successful trajectories to reinforce updates leading to higher reward. We evaluate MAC on tagging Personally Identifiabl

R
Rushil Thareja, Gautam Gupta, Francesco Pinto, Nils Lukas
· · 1 min read · 16 views

arXiv:2603.15968v1 Announce Type: new Abstract: Constitutional AI is a method to oversee and control LLMs based on a set of rules written in natural language. These rules are typically written by human experts, but could in principle be learned automatically given sufficient training data for the desired behavior. Existing LLM-based prompt optimizers attempt this but are ineffective at learning constitutions since (i) they require many labeled examples and (ii) lack structure in the optimized prompts, leading to diminishing improvements as prompt size grows. To address these limitations, we propose Multi-Agent Constitutional Learning (MAC), which optimizes over structured prompts represented as sets of rules using a network of agents with specialized tasks to accept, edit, or reject rule updates. We also present MAC+, which improves performance by training agents on successful trajectories to reinforce updates leading to higher reward. We evaluate MAC on tagging Personally Identifiable Information (PII), a classification task with limited labels where interpretability is critical, and demonstrate that it generalizes to other agentic tasks such as tool calling. MAC outperforms recent prompt optimization methods by over 50%, produces human-readable and auditable rule sets, and achieves performance comparable to supervised fine-tuning and GRPO without requiring parameter updates.

Executive Summary

This article presents a novel approach to constitutional AI, called Multi-Agent Constitutional Learning (MAC), which optimizes over structured prompts represented as sets of rules using a network of agents with specialized tasks. MAC addresses limitations of existing LLM-based prompt optimizers, including the need for many labeled examples and lack of structure in optimized prompts. The authors demonstrate MAC's effectiveness on a tagging Personally Identifiable Information (PII) task and its generalizability to other agentic tasks, such as tool calling. MAC outperforms recent prompt optimization methods by over 50% and achieves performance comparable to supervised fine-tuning and GRPO without requiring parameter updates. The proposed approach has significant implications for the development of more interpretable and auditable AI systems.

Key Points

  • MAC addresses limitations of existing LLM-based prompt optimizers
  • MAC optimizes over structured prompts represented as sets of rules using a network of agents
  • MAC generalizes to other agentic tasks, such as tool calling

Merits

Strength in addressing limitations of existing LLM-based prompt optimizers

MAC's ability to optimize over structured prompts and address the need for many labeled examples and lack of structure in optimized prompts makes it a significant improvement over existing methods.

Improvement in performance

MAC outperforms recent prompt optimization methods by over 50% and achieves performance comparable to supervised fine-tuning and GRPO without requiring parameter updates.

Enhancement of interpretability and auditability

MAC's ability to produce human-readable and auditable rule sets makes it a more interpretable and auditable AI system.

Demerits

Limitation in requiring significant computational resources

The proposed approach may require significant computational resources to train the network of agents, which could be a limitation in practical applications.

Potential for overfitting

The use of a network of agents to optimize prompts may lead to overfitting if the training data is not sufficiently diverse.

Expert Commentary

The proposed approach of MAC is a significant improvement over existing LLM-based prompt optimizers. However, it may require significant computational resources to train the network of agents, which could be a limitation in practical applications. Additionally, the use of a network of agents to optimize prompts may lead to overfitting if the training data is not sufficiently diverse. Nevertheless, MAC's ability to produce human-readable and auditable rule sets makes it a more interpretable and auditable AI system, which is a significant advantage. Further research is needed to address the limitations of MAC and to explore its potential applications in practical and policy contexts.

Recommendations

  • Further research is needed to address the limitations of MAC and to explore its potential applications in practical and policy contexts.
  • The development of more interpretable and auditable AI systems is essential for building trust and confidence in AI systems, and MAC is a significant step towards achieving this goal.

Sources