Top Research Papers on Reinforcement Learning
Discover the key research papers that have shaped the field of reinforcement learning. These papers provide essential insights into the algorithms, methodologies, and applications driving this area of AI. Whether you are a researcher, student, or enthusiast, delve into these top papers to deepen your understanding of reinforcement learning.
Looking for research-backed answers?Try AI Search
Recovery RL: Safe Reinforcement Learning With Learned Recovery Zones
184 Citations 2021Brijen Thananjeyan, Ashwin Balakrishna, Suraj Nair + 7 more
IEEE Robotics and Automation Letters
This work proposes Recovery RL, an algorithm which navigates this tradeoff by leveraging offline data to learn about constraint violating zones before policy learning and separating the goals of improving task performance and constraint satisfaction across two policies: a task policy that only optimizes the task reward and a recovery policy that guides the agent to safety when constraint violation is likely.
CFR-RL: Traffic Engineering With Reinforcement Learning in SDN
162 Citations 2020Junjie Zhang, Minghao Ye, Zehua Guo + 2 more
IEEE Journal on Selected Areas in Communications
CFR-RL (Critical Flow Rerouting-Reinforcement Learning), a Reinforcement Learning-based scheme that learns a policy to select critical flows for each given traffic matrix automatically and reroutes these selected critical flows to balance link utilization of the network by formulating and solving a simple Linear Programming problem.
RL-CycleGAN: Reinforcement Learning Aware Simulation-to-Real
149 Citations 2020Kanishka Rao, C.J. Harris, Alex Irpan + 3 more
journal unavailable
The RL-CycleGAN, a new approach for simulation-to-real-world transfer for reinforcement learning, is obtained by incorporating the RL-scene consistency loss into unsupervised domain translation, which ensures that the translation operation is invariant with respect to the Q-values associated with the image.
RL-Routing: An SDN Routing Algorithm Based on Deep Reinforcement Learning
145 Citations 2020Yiren Chen, Amir Rezapour, Wen-Guey Tzeng + 1 more
IEEE Transactions on Network Science and Engineering
This work develops a reinforcement learning routing algorithm (RL-Routing) to solve a traffic engineering problem of SDN in terms of throughput and delay and considers comprehensive network information for state representation and use one-to-many network configuration for routing choices.
GCN-RL Circuit Designer: Transferable Transistor Sizing with Graph Neural Networks and Reinforcement Learning
266 Citations 2020Hanrui Wang, Kuan Wang, Jiacheng Yang + 4 more
journal unavailable
This paper presents GCN-RL Circuit Designer, leveraging reinforcement learning (RL) to transfer the knowledge between different technology nodes and topologies, and demonstrates that RL with transfer learning can achieve much higher FoMs than methods without knowledge transfer.
RL-GA: A Reinforcement Learning-based Genetic Algorithm for Electromagnetic Detection Satellite Scheduling Problem
134 Citations 2023Yanjie Song, Luona Wei, Qing Yang + 3 more
Swarm and Evolutionary Computation
Through the experimental verification of multiple instances, it can be seen that the RL-GA can solve the EDSSP problem effectively and performs better in several aspects than the state-of-the-art algorithms.
Reinforced model predictive control (RL-MPC) for building energy management
202 Citations 2022Javier Arroyo, Carlo Manna, Fred Spiessens + 1 more
Applied Energy
sponsorship: This work emerged from the IBPSA Project 1, an international project conducted under the umbrella of the International Building Performance Simulation Association (IBPSA) . Project 1 will develop and demonstrate a BIM/GIS and Modelica Framework for building and community energy system design and operation. The work of Javier Arroyo is financed by VITO, Belgium through a PhD Fellowship (grant number 1710754) . Finally, the authors wish to thank to Brida V. Mbuwir, Jan Drgona, and Iago Cupeiro Figueroa for kindly reviewing the paper. (Project 1 will develop and demonstrate a BIM/GIS...
A Survey of Deep RL and IL for Autonomous Driving Policy Learning
182 Citations 2021Zeyu Zhu, Huijing Zhao
IEEE Transactions on Intelligent Transportation Systems
This is the first survey to focus on AD policy learning using DRL/DIL, which is addressed simultaneously from the system, task-driven and problem-driven perspectives.
Hierarchical Reinforcement Learning
363 Citations 2021Shubham Pateria, Budhitama Subagdja, Ah‐Hwee Tan + 1 more
ACM Computing Surveys
A survey of the diverse HRL approaches concerning the challenges of learning hierarchical policies, subtask discovery, transfer learning, and multi-agent learning using HRL is presented according to a novel taxonomy of the approaches.
Deep Reinforcement Learning
216 Citations 2020Hao Dong, Zihan Ding, Shanghang Zhang
journal unavailable
This is the first comprehensive and self-contained introduction to deep reinforcement learning, covering all aspects from fundamentals and research to applications. It includes examples and codes to help readers practice and implement the techniques.
Transfer Learning in Deep Reinforcement Learning: A Survey
150 Citations 2020Zhuangdi Zhu, Kaixiang Lin, Jiayu Zhou + 1 more
arXiv (Cornell University)
Reinforcement learning is a learning paradigm for solving sequential decision-making problems. Recent years have witnessed remarkable progress in reinforcement learning upon the fast development of deep neural networks. Along with the promising prospects of reinforcement learning in numerous domains such as robotics and game-playing, transfer learning has arisen to tackle various challenges faced by reinforcement learning, by transferring knowledge from external expertise to facilitate the efficiency and effectiveness of the learning process. In this survey, we systematically investigate the r...
Offline Reinforcement Learning with Implicit Q-Learning
129 Citations 2021Ilya Kostrikov, Ashvin Nair, Sergey Levine
arXiv (Cornell University)
Offline reinforcement learning requires reconciling two conflicting aims: learning a policy that improves over the behavior policy that collected the dataset, while at the same time minimizing the deviation from the behavior policy so as to avoid errors due to distributional shift. This trade-off is critical, because most current offline reinforcement learning methods need to query the value of unseen actions during training to improve the policy, and therefore need to either constrain these actions to be in-distribution, or else regularize their values. We propose an offline RL method that ne...
Conservative Q-Learning for Offline Reinforcement Learning
531 Citations 2020Aviral Kumar, Aurick Zhou, George Tucker + 1 more
arXiv (Cornell University)
Conservative Q-learning (CQL) is proposed, which aims to address limitations of offline RL methods by learning a conservative Q-function such that the expected value of a policy under this Q- function lower-bounds its true value.
Deep learning, reinforcement learning, and world models
419 Citations 2022Yutaka Matsuo, Yann LeCun, Maneesh Sahani + 5 more
Neural Networks
This review of talks and discussions in the "Deep Learning and Reinforcement Learning" session of the symposium, International Symposium on Artificial Intelligence and Brain Science, discusses whether the authors can achieve comprehensive understanding of human intelligence based on the recent advances of deep learning and reinforcement learning algorithms.
Transfer Learning in Deep Reinforcement Learning: A Survey
630 Citations 2023Zhuangdi Zhu, Kaixiang Lin, Anil K. Jain + 1 more
IEEE Transactions on Pattern Analysis and Machine Intelligence
This survey systematically investigates the recent progress of transfer learning approaches in the context of deep reinforcement learning, and provides a framework for categorizing the state-of-the-art transfer Learning approaches under which to analyze their goals, methodologies, compatible reinforcement learning backbones, and practical applications.
Reinforcement Learning in Healthcare: A Survey
443 Citations 2021Chao Yu, Jiming Liu, Shamim Nemati + 1 more
ACM Computing Surveys
This survey provides an extensive overview of RL applications in a variety of healthcare domains, ranging from dynamic treatment regimes in chronic diseases and critical care, automated medical diagnosis, and many other control or scheduling problems that have infiltrated every aspect of the healthcare system.
Learning for a Robot: Deep Reinforcement Learning, Imitation Learning, Transfer Learning
209 Citations 2021Hua Jiang, Liangcai Zeng, Gongfa Li + 1 more
Sensors
A state-of-the-art survey on an intelligent robot with the capability of autonomous deciding and learning reveals that the latest research in deep learning and reinforcement learning has paved the way for highly complex tasks to be performed by robots.
Beyond dichotomies in reinforcement learning
132 Citations 2020Anne Collins, Jeffrey Cockburn
Nature reviews. Neuroscience
It is argued that the field is well positioned to move beyond simplistic dichotomies, and a means of refocusing research questions towards the rich and complex components that comprise learning and decision-making is proposed.
Deep Reinforcement Learning: A Survey
658 Citations 2022Xu Wang, Sen Wang, Xingxing Liang + 5 more
IEEE Transactions on Neural Networks and Learning Systems
The fundamental theories, key algorithms, and primary research domains of DRL, in addition to value-based and policy-based DRL algorithms, are summarized and the advances in maximum entropy- based DRL are summarized.
Reinforcement Learning with Augmented Data
245 Citations 2020Michael Laskin, Kimin Lee, Adam Stooke + 3 more
arXiv (Cornell University)
Learning from visual observations is a fundamental yet challenging problem in Reinforcement Learning (RL). Although algorithmic advances combined with convolutional neural networks have proved to be a recipe for success, current methods are still lacking on two fronts: (a) data-efficiency of learning and (b) generalization to new environments. To this end, we present Reinforcement Learning with Augmented Data (RAD), a simple plug-and-play module that can enhance most RL algorithms. We perform the first extensive study of general data augmentations for RL on both pixel-based and state-based inp...
Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey
227 Citations 2020Sanmit Narvekar, Bei Peng, Matteo Leonetti + 3 more
arXiv (Cornell University)
This article presents a framework for curriculum learning (CL) in reinforcement learning, and uses it to survey and classify existing CL methods in terms of their assumptions, capabilities, and goals.
Game-Theoretic Multiagent Reinforcement Learning
146 Citations 2020Yaodong Yang, Jun Wang, Ding, Zihan + 4 more
arXiv (Cornell University)
Tremendous advances have been made in multiagent reinforcement learning (MARL). MARL corresponds to the learning problem in a multiagent system in which multiple agents learn simultaneously. It is an interdisciplinary field of study with a long history that includes game theory, machine learning, stochastic control, psychology, and optimization. Despite great successes in MARL, there is a lack of a self-contained overview of the literature that covers game-theoretic foundations of modern MARL methods and summarizes the recent advances. The majority of existing surveys are outdated and do not f...
Learning to Operate Distribution Networks With Safe Deep Reinforcement Learning
121 Citations 2022Hepeng Li, Haibo He
IEEE Transactions on Smart Grid
A safe deep reinforcement learning (SDRL) based method to solve the problem of optimal operation of distribution networks (OODN) as a constrained Markov decision process using a stochastic policy built upon a joint distribution of mixed random variables.
A Minimalist Approach to Offline Reinforcement Learning
163 Citations 2021Scott Fujimoto, Shixiang Gu
arXiv (Cornell University)
Offline reinforcement learning (RL) defines the task of learning from a fixed batch of data. Due to errors in value estimation from out-of-distribution actions, most offline RL algorithms take the approach of constraining or regularizing the policy with the actions contained in the dataset. Built on pre-existing RL algorithms, modifications to make an RL algorithm work offline comes at the cost of additional complexity. Offline RL algorithms introduce new hyperparameters and often leverage secondary components such as generative models, while adjusting the underlying RL algorithm. In this pape...
Chip Placement with Deep Reinforcement Learning
152 Citations 2020Azalia Mirhoseini, Anna Goldie, Mustafa Ege Yazgan + 19 more
arXiv (Cornell University)
This work presents a learning-based approach to chip placement, and shows that, in under 6 hours, this method can generate placements that are superhuman or comparable on modern accelerator netlists, whereas existing baselines require human experts in the loop and take several weeks.
Distributional Reinforcement Learning with Quantile Regression
149 Citations 2024Will Dabney, Mark Rowland, Marc G. Bellemare + 1 more
arXiv (Cornell University)
In reinforcement learning an agent interacts with the environment by taking actions and observing the next state and reward. When sampled probabilistically, these state transitions, rewards, and actions can all induce randomness in the observed long-term return. Traditionally, reinforcement learning algorithms average over this randomness to estimate the value function. In this paper, we build on recent work advocating a distributional approach to reinforcement learning in which the distribution over returns is modeled explicitly instead of only estimating the mean. That is, we examine methods...
Reinforcement-learning in fronto-striatal circuits
128 Citations 2021Bruno B. Averbeck, John P. O’Doherty
Neuropsychopharmacology
It will be necessary to build bridges from algorithmic level descriptions of computational reinforcement-learning to implementational level models to better understand how reinforcement- learning emerges from multiple distributed neural networks in the brain.
Exploratory Combinatorial Optimization with Reinforcement Learning
140 Citations 2020Thomas D. Barrett, William R. Clements, Jakob Foerster + 1 more
Proceedings of the AAAI Conference on Artificial Intelligence
The approach of exploratory combinatorsial optimization (ECO-DQN) is, in principle, applicable to any combinatorial problem that can be defined on a graph and can be combined with other search methods to further improve performance, which is demonstrated using a simple random search.
Reinforcement Learning with Quantum Variational Circuit
118 Citations 2020Owen Lockwood, Mei Si
Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment
Results indicate both hybrid and pure quantum variational circuit have the ability to solve reinforcement learning tasks with a smaller parameter space.
Exploration in deep reinforcement learning: A survey
434 Citations 2022Paweł Ładosz, Lilian Weng, Minwoo Kim + 1 more
Information Fusion
This review provides a comprehensive overview of existing exploration approaches, which are categorized based on the key contributions as follows reward novel states, reward diverse behaviours, goal-based methods, probabilistic methods, imitation-based methods, safe exploration and random-based methods.
Deep Reinforcement Learning for Cyber Security
490 Citations 2021Thanh Thi Nguyen, Vijay Janapa Reddi
IEEE Transactions on Neural Networks and Learning Systems
This article presents a survey of DRL approaches developed for cyber security, including DRL-based security methods for cyber–physical systems, autonomous intrusion detection techniques, and multiagent D RL-based game theory simulations for defense strategies against cyberattacks.
Deep Reinforcement Learning for Multiobjective Optimization
347 Citations 2020Kaiwen Li, Tao Zhang, Rui Wang
IEEE Transactions on Cybernetics
The proposed DRL-MOA method provides a new way of solving the MOP by means of DRL that has shown a set of new characteristics, for example, strong generalization ability and fast solving speed in comparison with the existing methods for multiobjective optimizations.
Survey on reinforcement learning for language processing
133 Citations 2022Víctor Uc-Cetina, Nicolás Navarro-Guerrero, Anabel Martín-González + 2 more
Artificial Intelligence Review
The state of the art of RL methods for their possible use for different problems of NLP, focusing primarily on conversational systems, is reviewed, mainly due to their growing relevance.
This article surveys reinforcement learning approaches in social robotics. Reinforcement learning is a framework for decision-making problems in which an agent interacts through trial-and-error with its environment to discover an optimal behavior. Since interaction is a key component in both reinforcement learning and social robotics, it can be a well-suited approach for real-world interactions with physically embodied social robots. The scope of the paper is focused particularly on studies that include social physical robots and real-world human-robot interactions with users. We present a tho...
Safe reinforcement learning for dynamical games
106 Citations 2020Yongliang Yang, Kyriakos G. Vamvoudakis, Hamidreza Modares
International Journal of Robust and Nonlinear Control
A novel actor‐critic‐barrier structure is presented for the multiplayer safety‐critical systems where non‐zero‐sum games with full‐state constraints are first transformed into unconstrained NZS games using a barrier function.
Model-based Reinforcement Learning: A Survey
449 Citations 2023Thomas M. Moerland, Joost Broekens, Aske Plaat + 1 more
Foundations and Trends® in Machine Learning
Sequential decision making, commonly formalized as Markov Decision Process (MDP) optimization, is an important challenge in artificial intelligence. Two key approaches to this problem are reinforcement learning (RL) and planning. This survey is an integration of both fields, better known as model-based reinforcement learning. Model-based RL has two main steps. First, we systematically cover approaches to dynamics model learning, including challenges like dealing with stochasticity, uncertainty, partial observability, and temporal abstraction. Second, we present a systematic categorization of p...
Recent advances in reinforcement learning in finance
144 Citations 2023Ben Hambly, Renyuan Xu, Huining Yang
Mathematical Finance
This survey paper aims to review the recent developments and use of RL approaches in finance, including optimal execution, portfolio optimization, option pricing and hedging, market making, smart order routing, and robo‐advising.
Reinforcement learning algorithms: A brief survey
467 Citations 2023Ashish Kumar Shakya, G. N. Pillai, Sohom Chakrabarty
Expert Systems with Applications
Reinforcement Learning (RL) is a machine learning (ML) technique to learn sequential decision-making in complex problems. RL is inspired by trial-and-error based human/animal learning. It can learn an optimal policy autonomously with knowledge obtained by continuous interaction with a stochastic dynamical environment. Problems considered virtually impossible to solve, such as learning to play video games just from pixel information, are now successfully solved using deep reinforcement learning. Without human intervention, RL agents can surpass human performance in challenging tasks. This revie...
Rapid Locomotion via Reinforcement Learning
114 Citations 2022Gabriel B. Margolis, Ge Yang, Kartik Paigwar + 2 more
journal unavailable
Agile maneuvers such as sprinting and high-speed turning in the wild are challenging for legged robots. We present an end-to-end learned controller that achieves record agility for the MIT Mini Cheetah, sustaining speeds up to 3.9 m/s. This system runs and turns fast on natural terrains like grass, ice, and gravel and responds robustly to disturbances. Our controller is a neural network trained in simulation via reinforcement learning and transferred to the real world. The two key components are (i) an adaptive curriculum on velocity commands and (ii) an online system identification strategy f...
Distral: Robust Multitask Reinforcement Learning
146 Citations 2025Yee Whye Teh
Oxford University Research Archive (ORA) (University of Oxford)
This work proposes a new approach for joint training of multiple tasks, which it refers to as Distral (Distill & transfer learning), and shows that the proposed learning process is more robust and more stable---attributes that are critical in deep reinforcement learning.