Balancing the Reasoning Load: Difficulty-Differentiated Policy Optimization with Length Redistribution for Efficient and Robust Reinforcement …
arXiv:2603.18533v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) have shown exceptional reasoning capabilities, but they also suffer from the issue of overthinking, often generating …
Yinan Xia, Haotian Zhang, Huiming Wang
8 views