Reinforcement Learning
Last updated
Last updated
Reinforcement Learning (RL) endows robots with the ability to learn control policies through trial-and-error interactions rather than hand-coding behaviors. This page surveys core RL approaches, their robotic applications, and a curated set of learning resources and software tools.
Value-Based Methods
Q-Learning & SARSA – Tabular methods for discrete state–action spaces
Deep Q-Networks (DQN) & Variants (Double DQN, Dueling DQN) – Neural-network approximators for high-dimensional inputs
Policy-Gradient Methods
REINFORCE – Monte-Carlo policy search
Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) – Stable on-policy updates
Actor-Critic (A2C, A3C) – Combines policy gradient with value estimates
Continuous-Control Algorithms
Deep Deterministic Policy Gradient (DDPG) & Twin Delayed DDPG (TD3) – Off-policy actor-critic for continuous actions
Soft Actor-Critic (SAC) – Maximum-entropy RL for robustness
Model-Based and Hybrid Methods
Model-Based Policy Optimization (MBPO) – Leverages learned dynamics models
Guided Policy Search – Uses trajectory optimization to supervise policy learning
Multi-Agent and Hierarchical RL
Multi-Agent Deep Q-Learning (MADDPG) – Cooperative and competitive settings
Hierarchical RL (options framework) – Temporal abstractions for long-horizon tasks
Locomotion & Legged Control
Learning stable walking, running gaits on quadrupeds and bipeds
Manipulation & Grasping
End-to-end policies for pick-and-place, tool use, and dexterous in-hand manipulation
Navigation & Mobile Robotics
Maze solving, obstacle avoidance, and mapless navigation with deep RL
Sim-to-Real Transfer
Aerial Robotics
Autonomous flight control for drones via RL
By blending these algorithms, platforms, and learning pathways, practitioners can accelerate the deployment of RL-powered robots-from simulated prototypes to real-world autonomy.
Sutton & Barto’s “Reinforcement Learning: An Introduction” ()
OpenAI Baselines DQN implementation ()
OpenAI Spinning Up tutorials ()
Stable Baselines3 implementations ()
Survey: “Reinforcement Learning in Robotic Applications” ()
NVIDIA’s Legged Gym environments ()
Dex-Net grasp planner with RL integration ()
ROS-Gazebo RL tutorials ()
Domain Randomization and Sim-to-Real pipelines in NVIDIA Isaac Sim ()
Microsoft AirSim environments ()
OpenAI Gym & Gym-Robotics ()
ROS RL Packages & ROS-Gym Bridges ()
NVIDIA Isaac RL & Isaac Gym ()
Ray RLlib: Scalable RL library ()
Unity ML-Agents: Game-engine–based RL ()
Intel Coach: Research RL framework ()
Coursera “Reinforcement Learning Specialization” by University of Alberta ()
Udacity “Deep Reinforcement Learning Nanodegree” ()
The Construct Academy “Reinforcement Learning for Robotics” ()
30 Days Coding “RL for Robotics: Locomotion & Navigation” ()
Kober, Bagnell & Peters (2013), “Reinforcement Learning in Robotics: A Survey” ()
Deisenroth, Neumann & Peters (2011), “A Survey on Policy Search for Robotics” ()
Singh et al. (2021), “Reinforcement Learning in Robotic Applications: A Comprehensive Survey” ()