Delightful Distributed Policy Gradient
arXiv:2603.20521v1 Announce Type: new Abstract: Distributed reinforcement learning trains on data from stale, buggy, or mismatched actors, producing actions with high surprisal (negative log-probability) under …
Ian Osband
14 views