login
Home / Papers / A Connection between One-Step RL and Critic Regularization in Reinforcement...

A Connection between One-Step RL and Critic Regularization in Reinforcement Learning

4 Citations2023
Benjamin Eysenbach, M. Geist, S. Levine
journal unavailable

Applying a multi-step critic regularization method with a regularization coefficient of 1 yields the same policy as one-step RL, drawing a connection between these methods.

Abstract

As with any machine learning problem with limited data, effective offline RL algorithms require careful regularization to avoid overfitting. One class of methods, known as one-step RL, perform just one step of policy improvement. These meth-ods, which include advantage-weighted regression and conditional behavioral cloning, are thus simple and stable, but can have limited asymptotic performance. A second class of methods, known as critic regularization, perform many steps of policy improvement with a regularized objective. These methods typically require more compute but have appealing lower-bound guarantees. In this paper, we draw a connection between these methods: applying a multi-step critic regularization method with a regularization coefficient of 1 yields the same policy as one-step RL. While our theoretical results require assumptions (e.g., deterministic dynamics), our experiments nevertheless show that our analysis makes accurate, testable predictions about practical offline RL methods (CQL and one-step RL) with commonly-used hyperparameters.