Reforming the Mechanism: Editing Reasoning Patterns in LLMs with Circuit Reshaping
arXiv:2603.06923v1 Announce Type: new Abstract: Large language models (LLMs) often exhibit flawed reasoning ability that undermines reliability. Existing approaches to improving reasoning typically treat it as a general and monolithic skill, applying broad training which is inefficient and unable to target specific reasoning errors. We introduce Reasoning Editing, a paradigm for selectively modifying specific reasoning patterns in LLMs while preserving other reasoning pathways. This task presents a fundamental trade-off between Generality, the ability of an edit to generalize across different tasks sharing the same reasoning pattern, and Locality, the ability to preserve other reasoning capabilities. Through systematic investigation, we uncover the Circuit-Interference Law: Edit interference between reasoning patterns is proportional to the overlap of their neural circuits. Guided by this principle, we propose REdit, the first framework to actively reshape neural circuits before editi
Video Coverage
Revolutionizing LLMs: Editing Reasoning Patterns
arXiv:2603.06923v1 Announce Type: new Abstract: Large language models (LLMs) often exhibit flawed reasoning ability that undermines reliability. Existing approaches to improving reasoning typically treat it as a general and monolithic skill, applying broad training which is inefficient and unable to target specific reasoning errors. We introduce Reasoning Editing, a paradigm for selectively modifying specific reasoning patterns in LLMs while preserving other reasoning pathways. This task presents a fundamental trade-off between Generality, the ability of an edit to generalize across different tasks sharing the same reasoning pattern, and Locality, the ability to preserve other reasoning capabilities. Through systematic investigation, we uncover the Circuit-Interference Law: Edit interference between reasoning patterns is proportional to the overlap of their neural circuits. Guided by this principle, we propose REdit, the first framework to actively reshape neural circuits before editing, thereby modulating interference between reasoning patterns and mitigating the trade-off. REdit integrates three components: (i) Contrastive Circuit Reshaping, which directly addresses the generality-locality trade-off by disentangling overlapping circuits; (ii) Meta-Contrastive Learning, which extends transferability to novel reasoning patterns; and (iii) Dual-Level Protection, which preserves preexisting abilities by constraining reshaping update directions and regularizing task-level predictions. Extensive experiments with Qwen-2.5-3B on propositional logic reasoning tasks across three difficulty levels demonstrate that REdit consistently achieves superior generality and locality compared to baselines, with additional validation in mathematics showing broader potential. Our code is available at https://github.com/LzyFischer/REdit.
Executive Summary
The article introduces Reasoning Editing, a novel paradigm for selectively modifying specific reasoning patterns in large language models (LLMs) while preserving other pathways. The proposed framework, REdit, reshapes neural circuits to mitigate the trade-off between generality and locality, achieving superior performance on propositional logic reasoning tasks. REdit's components, including Contrastive Circuit Reshaping and Meta-Contrastive Learning, enable effective editing and transferability to novel patterns.
Key Points
- ▸ Introduction of Reasoning Editing paradigm for selective modification of LLMs' reasoning patterns
- ▸ Proposal of REdit framework with Contrastive Circuit Reshaping, Meta-Contrastive Learning, and Dual-Level Protection
- ▸ Experimental validation on Qwen-2.5-3B with superior generality and locality compared to baselines
Merits
Effective Editing
REdit's ability to selectively modify specific reasoning patterns while preserving other pathways is a significant improvement over existing approaches.
Improved Transferability
Meta-Contrastive Learning enables REdit to extend transferability to novel reasoning patterns, enhancing its practical applicability.
Demerits
Computational Complexity
The circuit reshaping process may introduce additional computational overhead, potentially limiting REdit's scalability.
Limited Domain Knowledge
The article's focus on propositional logic reasoning tasks and mathematics may not directly translate to other domains, requiring further experimentation.
Expert Commentary
The article presents a significant advancement in the field of natural language processing, offering a novel approach to editing reasoning patterns in LLMs. REdit's ability to balance generality and locality is a crucial step towards developing more reliable and efficient language models. However, further research is needed to address the potential limitations and explore the broader applications of this technology. The intersection of REdit with explainability, adversarial robustness, and ethical considerations will be essential areas of investigation in the future.
Recommendations
- ✓ Further experimentation with REdit on diverse domains and tasks to demonstrate its versatility
- ✓ Investigation into the potential applications of REdit in areas like explainability, adversarial robustness, and fairness