> For the complete documentation index, see [llms.txt](https://panav.gitbook.io/robotics-handbook/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://panav.gitbook.io/robotics-handbook/perception-and-computer-vision/reinforcement-learning.md). # Reinforcement Learning (classical foundations)

{% hint style="info" %} **This page covers the classical RL foundations.** For the modern post-2023 stack - PPO with privileged learning for legged locomotion, SERL/HIL-SERL, Eureka (LLM-generated rewards), foundation-model policies, sim-to-real, world models - see the dedicated [Robot Learning](/robotics-handbook/robot-learning/robot-learning.md) section. {% endhint %} Reinforcement Learning (RL) endows robots with the ability to learn control policies through trial-and-error interactions rather than hand-coding behaviors. This page surveys core RL approaches, their robotic applications, and a curated set of learning resources and software tools. {% embed url="" %} ### Core RL Algorithms and Resources * Value-Based Methods * Q-Learning & SARSA – Tabular methods for discrete state–action spaces * Sutton & Barto’s “Reinforcement Learning: An Introduction” () * Deep Q-Networks (DQN) & Variants (Double DQN, Dueling DQN) – Neural-network approximators for high-dimensional inputs * OpenAI Baselines DQN implementation () * Policy-Gradient Methods * REINFORCE – Monte-Carlo policy search * Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) – Stable on-policy updates * OpenAI Spinning Up tutorials ([https://spinningup.openai.com](https://spinningup.openai.com/)) * Actor-Critic (A2C, A3C) – Combines policy gradient with value estimates * Continuous-Control Algorithms * Deep Deterministic Policy Gradient (DDPG) & Twin Delayed DDPG (TD3) – Off-policy actor-critic for continuous actions * Soft Actor-Critic (SAC) – Maximum-entropy RL for robustness * Stable Baselines3 implementations () * Model-Based and Hybrid Methods * Model-Based Policy Optimization (MBPO) – Leverages learned dynamics models * Guided Policy Search – Uses trajectory optimization to supervise policy learning * Survey: “Reinforcement Learning in Robotic Applications” () * Multi-Agent and Hierarchical RL * Multi-Agent Deep Q-Learning (MADDPG) – Cooperative and competitive settings * Hierarchical RL (options framework) – Temporal abstractions for long-horizon tasks ### Robotics Applications * Locomotion & Legged Control * Learning stable walking, running gaits on quadrupeds and bipeds * NVIDIA’s Legged Gym environments () * Manipulation & Grasping * End-to-end policies for pick-and-place, tool use, and dexterous in-hand manipulation * Dex-Net grasp planner with RL integration () * Navigation & Mobile Robotics * Maze solving, obstacle avoidance, and mapless navigation with deep RL * ROS-Gazebo RL tutorials () * Sim-to-Real Transfer * Domain Randomization and Sim-to-Real pipelines in NVIDIA Isaac Sim () * Aerial Robotics * Autonomous flight control for drones via RL * Microsoft AirSim environments () ### Software Frameworks & Toolkits * OpenAI Gym & Gym-Robotics () * ROS RL Packages & ROS-Gym Bridges () * NVIDIA Isaac RL & Isaac Gym () * Ray RLlib: Scalable RL library () * Unity ML-Agents: Game-engine–based RL () * Intel Coach: Research RL framework () ### Online Courses & Tutorials * Coursera “Reinforcement Learning Specialization” by University of Alberta () * Udacity “Deep Reinforcement Learning Nanodegree” () * The Construct Academy “Reinforcement Learning for Robotics” () * 30 Days Coding “RL for Robotics: Locomotion & Navigation” () ### Key Survey Papers * Kober, Bagnell & Peters (2013), “Reinforcement Learning in Robotics: A Survey” () * Deisenroth, Neumann & Peters (2011), “A Survey on Policy Search for Robotics” () * Singh et al. (2021), “Reinforcement Learning in Robotic Applications: A Comprehensive Survey” () By blending these algorithms, platforms, and learning pathways, practitioners can accelerate the deployment of RL-powered robots-from simulated prototypes to real-world autonomy. --- # Agent Instructions This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com. ## Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://panav.gitbook.io/robotics-handbook/perception-and-computer-vision/reinforcement-learning.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.