Home / Papers / Top Research Papers on Reinforcement Learning

Top Research Papers on Reinforcement Learning

Discover the key research papers that have shaped the field of reinforcement learning. These papers provide essential insights into the algorithms, methodologies, and applications driving this area of AI. Whether you are a researcher, student, or enthusiast, delve into these top papers to deepen your understanding of reinforcement learning.

Looking for research-backed answers?Try AI Search

Learning the Quadruped Robot by Reinforcement Learning (RL)

1 Citations 2022

A. Issa, A. Aldair

Iraqi Journal for Electrical and Electronic Engineering

DDPG reinforcement learning is proposed to generate and improve the walking motion of the quadruped robot, and the results show that the behaviour of the walking robot has been improved compared with the previous cases.

Dueling RL: Reinforcement Learning with Trajectory Preferences

66 Citations 2021

Aldo Pacchiano, Aadirupa Saha, Jonathan Lee

journal unavailable

This work is one of the first to give tight regret guarantees for preference based RL problems with trajectory preferences, where the trajectory preferences are encoded by a generalized linear model of dimension $d.

RL-GRIT: Reinforcement Learning for Grammar Inference

4 Citations 2021

Walt Woods

2021 IEEE Security and Privacy Workshops (SPW)

This work proposes a novel set of mechanisms for grammar inference, RL-GRIT1, and shows that RL can be used to surpass the expressiveness of both classes, and offers a clear path to learning context-sensitive languages.

RL-OPC: Mask Optimization With Deep Reinforcement Learning

6 Citations 2024

Xiaoxiao Liang, Yikang Ouyang, Haoyu Yang + 2 more

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

This article pioneer introducing the reinforcement learning (RL) model for mask optimization, which directly optimizes the preferred objective without leveraging a differentiable proxy, and outperforms state-of-the-art solutions.

RL-GPT: Integrating Reinforcement Learning and Code-as-policy

6 Citations 2024

Shaoteng Liu, Haoqi Yuan, Minda Hu + 5 more

ArXiv

This work introduces a two-level hierarchical framework, RL-GPT, comprising a slow agent and a fast agent, comprising a slow agent analyzes actions suitable for coding, while the fast agent executes coding tasks.

RL-ViGen: A Reinforcement Learning Benchmark for Visual Generalization

12 Citations 2023

Zhecheng Yuan, Sizhe Yang, Pu Hua + 4 more

ArXiv

RL-ViGen is a novel Reinforcement Learning Benchmark for Visual Generalization, which contains diverse tasks and a wide spectrum of generalization types, thereby facilitating the derivation of more reliable conclusions and laying a foundation for the future creation of universal visual generalization RL agents suitable for real-world scenarios.

Visuotactile-RL: Learning Multimodal Manipulation Policies with Deep Reinforcement Learning

27 Citations 2022

Johanna Hansen, F. Hogan, D. Rivkin + 3 more

2022 International Conference on Robotics and Automation (ICRA)

This paper focuses on the problem setting where both visual and tactile sensors provide pixel-level feedback for Visuotactile reinforcement learning agents, and investigates the challenges associated with multimodal learning and proposes several improvements to existing RL methods.

RL: FAST REINFORCEMENT LEARNING VIA SLOW REINFORCEMENT LEARNING

47 Citations 2016

Yan Duan, John Schulman, Xi Chen + 3 more

journal unavailable

This paper proposes to represent a “fast” reinforcement learning algorithm as a recurrent neural network (RNN) and learn it from data, and shows that its performance on new MDPs is close to human-designed algorithms with optimality guarantees.

Distributed Reinforcement Learning with ADMM-RL

17 Citations 2019

P. Graf, J. Annoni, C. Bay + 4 more

2019 American Control Conference (ACC)

A new algorithm for distributed Reinforcement Learning (RL), a combination of the Alternating Direction Method of Multipliers (ADMM) and reinforcement learning that allows for integrating learned controllers as subsystems in generally convergent distributed control applications.

Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning

7 Citations 2024

Shengyi Huang, Quentin Gallou'edec, Florian Felten + 30 more

ArXiv

This document presents Open RL Benchmark, a set of fully tracked RL experiments, including not only the usual data such as episodic return, but also all algorithm-specific and system metrics, and is the first RL benchmark of its kind.

RL-DARTS: Differentiable Architecture Search for Reinforcement Learning

8 Citations 2021

Yingjie Miao, Xingyou Song, Daiyi Peng + 3 more

ArXiv

Throughout this training process, it is shown that the supernet gradually learns better cells, leading to alternative architectures which can be highly competitive against manually designed policies, but also verify previous design choices for RL policies.

Dual RL: Unification and New Methods for Reinforcement and Imitation Learning

13 Citations 2023

Harshit S. Sikchi, Qinqing Zheng, Amy Zhang + 1 more

journal unavailable

This work casts several state-of-the-art offline RL and offline imitation learning algorithms as instances of dual RL approaches with shared structures, and proposes a new discriminator-free method ReCOIL that learns to imitate from arbitrary off-policy data to obtain near-expert performance.

SINDy-RL: Interpretable and Efficient Model-Based Reinforcement Learning

4 Citations 2024

Nicholas Zolman, Urban Fasel, J. Kutz + 1 more

ArXiv

SINDy-RL is introduced, a unifying framework for combining SINDy and DRL to create efficient, interpretable, and trustworthy representations of the dynamics model, reward function, and control policy that results in an interpretable control policy orders of magnitude smaller than a deep neural network policy.

RL Unplugged: Benchmarks for Offline Reinforcement Learning

69 Citations 2020

Caglar Gulcehre, Ziyun Wang, Alexander Novikov + 15 more

ArXiv

This paper proposes a benchmark called RL Unplugged to evaluate and compare offline RL methods, a suite of benchmarks that will increase the reproducibility of experiments and make it possible to study challenging tasks with a limited computational budget, thus making RL research both more systematic and more accessible across the community.

RL-MUL: Multiplier Design Optimization with Deep Reinforcement Learning

7 Citations 2023

Dongsheng Zuo, Yikang Ouyang, Yuzhe Ma

2023 60th ACM/IEEE Design Automation Conference (DAC)

A multiplier design optimization framework based on reinforcement learning is proposed, utilizing matrix and tensor representations for the compressor tree of a multiplier, enabling seamless integration of convolutional neural networks as the agent network.

RL-IoT: Reinforcement Learning to Interact with IoT Devices

2 Citations 2021

Giulia Milan, L. Vassio, I. Drago + 1 more

2021 IEEE International Conference on Omni-Layer Intelligent Systems (COINS)

Our life is getting filled by Internet of Things (IoT) devices. These devices often rely on closed or poorly documented protocols, with unknown formats and semantics. Learning how to interact with such devices in an autonomous manner is the key for interoperability and automatic verification of their capabilities. In this paper, we propose RL-IoT, a system that explores how to automatically interact with possibly unknown IoT devices. We leverage reinforcement learning (RL) to recover the semantics of protocol messages and to take control of the device to reach a given goal, while minimizing th...

RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning

960 Citations 2016

Yan Duan, John Schulman, Xi Chen + 3 more

ArXiv

This paper proposes to represent a "fast" reinforcement learning algorithm as a recurrent neural network (RNN) and learn it from data, encoded in the weights of the RNN, which are learned slowly through a general-purpose ("slow") RL algorithm.

EER-RL: Energy-Efficient Routing Based on Reinforcement Learning

23 Citations 2021

Vially Kazadi Mutombo, Seungyeon Lee, Jusuk Lee + 1 more

Mob. Inf. Syst.

The proposed EER-RL, an energy-efficient routing protocol based on reinforcement learning, allows devices to adapt to network changes, such as mobility and energy level, and improve routing decisions, and the results show that the proposed protocol performs better in terms of energy efficiency and network lifetime and scalability.

RL-MLZerD: Multimeric protein docking using reinforcement learning

13 Citations 2022

Tunde Aderinwale, Charles W Christoffer, D. Kihara

Frontiers in Molecular Biosciences

A novel method is introduced, RL-MLZerD, which builds multiple protein complexes using reinforcement learning (RL), and it emerged that the docking order of multi-chain complexes can be naturally predicted by examining preferred paths of episodes in the RL computation.

Recovery RL: Safe Reinforcement Learning With Learned Recovery Zones

193 Citations 2020

Brijen Thananjeyan, A. Balakrishna, Suraj Nair + 7 more

IEEE Robotics and Automation Letters

This work proposes Recovery RL, an algorithm which navigates this tradeoff by leveraging offline data to learn about constraint violating zones before policy learning and separating the goals of improving task performance and constraint satisfaction across two policies: a task policy that only optimizes the task reward and a recovery policy that guides the agent to safety when constraint violation is likely.

RL-EMO: A Reinforcement Learning Framework for Multimodal Emotion Recognition

2 Citations 2024

Chengwen Zhang, Yuhao Zhang, Bo Cheng

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

A novel Reinforcement Learning framework for the multimodal EMOtion recognition task (RL-EMO), which combines a Multi-modal Graph Convolution Network (MMGCN) module with a novel Reinforcement Learning (RL) module to model context at both the semantic and emotional levels respectively.

RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning

93 Citations 2022

Marc Rigter, Bruno Lacerda, Nick Hawes

ArXiv

This work presents Robust Adversarial Model-Based Offline RL (RAMBO), a novel approach to model-based offline RL that addresses the problem as a two-player zero sum game against an adversarial environment model and demonstrates that it outperforms existing state-of-the-art baselines.

Hyp-RL : Hyperparameter Optimization by Reinforcement Learning

61 Citations 2019

H. Jomaa, Josif Grabocka, L. Schmidt-Thieme

ArXiv

Experiments demonstrate that the model does not have to rely on a heuristic acquisition function like SMBO, but can learn which hyperparameters to test next based on the subsequent reduction in validation loss they will eventually lead to, and outperforms the state-of-the-art approaches for hyperparameter learning.

Evo-RL: evolutionary-driven reinforcement learning

13 Citations 2020

Ahmed Hallawa, Thorsten Born, A. Schmeink + 6 more

Proceedings of the Genetic and Evolutionary Computation Conference Companion

Results show that reinforcement learning algorithms embedded within the Evolutionary-Driven Reinforcement Learning approach significantly outperform the stand-alone versions of the same RL algorithms on OpenAI Gym control problems with rewardless states constrained by the same computational budget.

PR-RL: Portrait Relighting Via Deep Reinforcement Learning

7 Citations 2021

Xiaoyan Zhang, Yukai Song, Zhuopeng Li + 1 more

IEEE Transactions on Multimedia

The experiments show that the proposed PR-RL method outperforms state-of-the-art methods in generating locally effective and interpretable high resolution relighting results for wild portrait images.

RL-X: A Deep Reinforcement Learning Library (not only) for RoboCup

1 Citations 2023

Nico Bohlinger, Klaus Dorer

journal unavailable

The new Deep Reinforcement Learning library RL-X provides a flexible and easy-to-extend codebase with self-contained single directory algorithms and its application to the RoboCup Soccer Simulation 3D League and classic DRL benchmarks is presented.

Reinforcement Learning (RL) for optimal power allocation in 6G Network

1 Citations 2023

Anutusha Dogra, R. Jha, K. R. Jha

2022 OPJU International Technology Conference on Emerging Technologies for Sustainable Development (OTCON)

An RL-based approach in which the agent will select the access point with the highest SNR for establishing a communication link with the user is proposed, which helps in effective spectrum utilization and minimizing power wastage.

RL-LOGO: Deep Reinforcement Learning Localization for Logo Recognition

2 Citations 2023

Masato Fujitake

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

A deep reinforcement learning localization method for logo recognition (RL-LOGO) utilizes deep reinforcement learning to identify a logo region in images without annotations of the positions, thereby improving classification accuracy.

RL-CAM: Visual Explanations for Convolutional Networks using Reinforcement Learning

12 Citations 2023

S. Sarkar, Ashwin Ramesh Babu, Sajad Mousavi + 5 more

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

This work proposes a novel approach that uses reinforcement learning to generate a visual explanation for CNNs, and demonstrates that the proposed method outperforms the existing techniques, producing more accurate localization masks of regions of interest in the input images.

Fabricatio-Rl: A Reinforcement Learning Simulation Framework For Production Scheduling

5 Citations 2021

Alexandru Rinciog, Anne Meyer

2021 Winter Simulation Conference (WSC)

This work introduces FabricatioRL, an RL compatible, customizable and extensible benchmarking simulation framework that can interface with both traditional approaches and RL, and ensure that generic production setups can be covered, and experiments are reproducible.

RL-CycleGAN: Reinforcement Learning Aware Simulation-to-Real

174 Citations 2020

Kanishka Rao, Chris Harris, A. Irpan + 3 more

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

The RL-CycleGAN, a new approach for simulation-to-real-world transfer for reinforcement learning, is obtained by incorporating the RL-scene consistency loss into unsupervised domain translation, which ensures that the translation operation is invariant with respect to the Q-values associated with the image.

Heading Control of a Ship Based on Deep Reinforcement Learning (RL)

3 Citations 2022

Sivaraman Sivaraj, S. Rajendran

OCEANS 2022 - Chennai

The heading control of a KVLCC2 tanker in calm water and waves is investigated and the ship dynamics is represented using a 3DoF numerical model.

Retro-RL: Reinforcing Nominal Controller With Deep Reinforcement Learning for Tilting-Rotor Drones

5 Citations 2022

I. Made, Aswin Nahrendra, Christian Tirtawardhana + 3 more

IEEE Robotics and Automation Letters

This letter proposes a novel hybrid architecture that reinforces a nominal controller with a robust policy learned using a model-free deep RL algorithm and employs an uncertainty-aware control mixer to preserve guaranteed stability of a nominal controllers while using the extended robust performance of the learned policy.

DEP-RL: Embodied Exploration for Reinforcement Learning in Overactuated and Musculoskeletal Systems

24 Citations 2022

Pierre Schumacher, D. Haeufle, Dieter Büchler + 2 more

ArXiv

Muscle-actuated organisms are capable of learning an unparalleled diversity of dexterous movements despite their vast amount of muscles. Reinforcement learning (RL) on large musculoskeletal models, however, has not been able to show similar performance. We conjecture that ineffective exploration in large overactuated action spaces is a key problem. This is supported by the finding that common exploration noise strategies are inadequate in synthetic examples of overactuated systems. We identify differential extrinsic plasticity (DEP), a method from the domain of self-organization, as being able...

CFR-RL: Traffic Engineering With Reinforcement Learning in SDN

112 Citations 2020

Member Ieee Junjie Zhang, Minghao Ye, Senior Member Ieee Zehua Guo + 2 more

IEEE Journal on Selected Areas in Communications

CFR-RL (Critical Flow Rerouting-Reinforcement Learning), a Reinforcement Learning-based scheme that learns a policy to select critical flows for each given traffic matrix automatically and reroutes these selected critical flows to balance link utilization of the network by formulating and solving a simple Linear Programming problem.

RL Unplugged: A Suite of Benchmarks for Ofﬂine Reinforcement Learning

115 Citations 2020

Caglar Gulcehre, Ziyun Wang, Alexander Novikov + 15 more

journal unavailable

This paper proposes a benchmark called RL Unplugged to evaluate and compare ofﬂine RL methods, and proposes detailed evaluation protocols for each domain in RL Unplugged and provides an extensive analysis of supervised learning and ofﬂine RL methods using these protocols.

Quafu-RL: The Cloud Quantum Computers based Quantum Reinforcement Learning

6 Citations 2023

Baqis Quafu Group

journal unavailable

The experimental results demonstrate that the Reinforcement Learning agents are capable of achieving goals that are slightly relaxed both during the training and inference stages, and meticulously design hardware-efficient PQC architectures in the quantum model using a multi-objective evolutionary algorithm and develop a learning algorithm that is adaptable to Quafu.

Conformer‐RL: A deep reinforcement learning library for conformer generation

1 Citations 2022

Runxuan Jiang, Tarun Gogineni, Joshua A Kammeraad + 3 more

Journal of Computational Chemistry

Conformer‐RL is an open‐source Python package for applying deep reinforcement learning (RL) to the task of generating a diverse set of low‐energy conformations for a single molecule and contains modular class interfaces for RL environments and agents, allowing users to easily swap components with their own implementations.

RL-Controller: a reinforcement learning framework for active structural control

2 Citations 2021

S. S. Eshkevari, S. S. Eshkevari, Debarshi Sen + 1 more

ArXiv

This paper presents a novel RL-based approach for designing active controllers by introducing RL-Controller, a flexible and scalable simulation environment that includes attributes and functionalities that are defined to model active structural control mechanisms in detail.

Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning

1 Citations 2025

Tian Xie, Zitian Gao, Qingnan Ren + 7 more

journal unavailable

The 7B model develops advanced reasoning skills-such as reflection, verification, and summarization-that are absent from the logic corpus after training on just 5K logic problems, and demonstrates generalization abilities to the challenging math benchmarks AIME and AMC.

RL: Generic reinforcement learning codebase in TensorFlow

No citations 2019

Bryan M. Li, A. Cowen-Rivers, Piotr Kozakowski + 6 more

J. Open Source Softw.

This work argues the five fundamental properties of a sophisticated research codebase are modularity, reproducibility, many RL algorithms pre-implemented, speed and ease of running on different hardware/ integration with visualization packages.

Quafu-RL: The cloud quantum computers based quantum reinforcement learning

1 Citations 2024

Yu-Xin 羽欣 Jin 靳, Hong-Ze 宏泽 Xu 许, Zheng-An 正安 Wang 王 + 29 more

Chinese Physics B

This work takes the first step towards executing benchmark quantum reinforcement problems on real devices equipped with at most 136 qubits on the BAQIS Quafu quantum computing cloud and designs hardware-efficient PQC architectures in the quantum model using a multi-objective evolutionary algorithm and develops a learning algorithm that is adaptable to quantum devices.

TACO-RL: Task Aware Prompt Compression Optimization with Reinforcement Learning

1 Citations 2024

Shivam Shandilya, Menglin Xia, Supriyo Ghosh + 4 more

ArXiv

This work proposes a novel and efficient reinforcement learning (RL) based task-aware prompt compression method that leverages existing Transformer encoder-based token classification model while guiding the learning process with task-specific reward signals using lightweight REINFORCE algorithm.

A Connection between One-Step RL and Critic Regularization in Reinforcement Learning

4 Citations 2023

Benjamin Eysenbach, M. Geist, S. Levine + 1 more

journal unavailable

Applying a multi-step critic regularization method with a regularization coefﬁcient of 1 yields the same policy as one-step RL, drawing a connection between these methods.

RLS-DTS: Reinforcement-Learning Linguistic Steganalysis in Distribution-Transformed Scenario

4 Citations 2023

Yihao Wang, Ru Zhang, Jianyi Liu

IEEE Signal Processing Letters

This letter proposes a reinforcement learning-based method for linguistic steganalysis that employs an agent (steganalyzer) to interact within an observation space, enabling adaptation to the characteristics of the transformed data and capturing Steganalysis features.

Flatland-RL : Multi-Agent Reinforcement Learning on Trains

53 Citations 2020

S. Mohanty, Erik Nygren, Florian Laurent + 11 more

ArXiv

A two-dimensional simplified grid environment called "Flatland" that allows for faster experimentation and demonstrates that ML has potential in solving the VRSP on Flatland and identifies key topics that need further research.

DisCo RL: Distribution-Conditioned Reinforcement Learning for General-Purpose Policies

14 Citations 2021

Soroush Nasiriany, Vitchyr H. Pong, Ashvin Nair + 3 more

2021 IEEE International Conference on Robotics and Automation (ICRA)

This paper proposes goal distributions as a general and broadly applicable task representation suitable for contextual policies and develops an off-policy algorithm called distribution-conditioned reinforcement learning (DisCo RL) to efficiently learn these policies.

Soar-RL : Integrating Reinforcement Learning with Soar

2 Citations 2004

Shelley Nason

journal unavailable

An architectural modification to Soar is described that gives a Soar agent the opportunity to learn statistical information about the past success of its actions and utilize this information when selecting an operator.

A Review of Current Perspective and Propensity in Reinforcement Learning (RL) in an Orderly Manner

3 Citations 2023

Shweta Pandey, Rohit Agarwal, Sachin Bhardwaj + 3 more

International Journal of Scientific Research in Computer Science, Engineering and Information Technology

In this essay, the state-of-the-art RL is thoroughly reviewed in the literature and found in a wide range of industries, including smart grids, robots, computer vision, healthcare, gaming, transportation, finance, and engineering.

MPR-RL: Multi-Prior Regularized Reinforcement Learning for Knowledge Transfer

8 Citations 2022

Quantao Yang, J. A. Stork, Todor Stoyanov

IEEE Robotics and Automation Letters

The Multi-Prior Regularized RL (MPR-RL) method is deployed directly on a real world Franka Panda arm, requiring only a set of demonstrated trajectories from similar, but crucially not identical, problem instances.