Optimize Wider, Not Deeper: Consensus Aggregation for Policy Optimization
arXiv:2603.12596v1 Announce Type: new Abstract: Proximal policy optimization (PPO) approximates the trust region update using multiple epochs of clipped SGD. Each epoch may drift further …
Zelal Su (Lain), Mustafaoglu, Sungyoung Lee, Eshan Balachandar, Risto Miikkulainen, Keshav Pingali
8 views