Home / Papers / A Connection between One-Step RL and Critic Regularization in Reinforcement...

A Connection between One-Step RL and Critic Regularization in Reinforcement Learning

7 Citations•2023•

Benjamin Eysenbach, M. Geist, S. Levine

journal unavailable

Applying a multi-step critic regularization method with a regularization coefﬁcient of 1 yields the same policy as one-step RL, drawing a connection between these methods.

Abstract

As with any machine learning problem with limited data, effective ofﬂine RL algorithms require careful regularization to avoid overﬁtting. One class of methods, known as one-step RL, perform just one step of policy improvement. These meth-ods, which include advantage-weighted regression and conditional behavioral cloning, are thus simple and stable, but can have limited asymptotic performance. A second class of methods, known as critic regularization, perform many steps of policy improvement with a regularized objective. These methods typically require more compute but have appealing lower-bound guarantees. In this paper, we draw a connection between these methods: applying a multi-step critic regularization method with a regularization coefﬁcient of 1 yields the same policy as one-step RL. While our theoretical results require assumptions (e.g., deterministic dynamics), our experiments nevertheless show that our analysis makes accurate, testable predictions about practical ofﬂine RL methods (CQL and one-step RL) with commonly-used hyperparameters.