Reinforcement learning for legged robots

Stéphane Caron. Fall class at École normale supérieure, Master MVA and Mines de Paris.

Abstract¶

This is a crash course on applying reinforcement learning to train policies that balance real legged robots. We first review the necessary basics: partially-observable Markov decision processes, value functions, the goal of reinforcement learning. We then focus on policy optimization: REINFORCE, policy gradient and proximal policy optimization (PPO). We finally focus on techniques to train real-robot policies from simulation data: domain randomization, simulation augmentation, teacher-student distillation, reward shaping, ...

Content¶

	Slides
	Slides source (CC-BY-4.0 license)

Example¶

Upkie robot balancing in simulation and in the real world

On Linux, you can run train and run the open source PPO balancer for Upkie wheeled bipeds:

$ git clone https://github.com/upkie/ppo_balancer.git
$ cd ppo_balancer
$ conda create -f environment.yaml
$ conda activate ppo_balancer
$ make show_training

References¶

	Spinning Up in Deep Reinforcement Learning ⭐
	Learning Dexterous In-Hand Manipulation
	Learning Agile and Dynamic Motor Skills for Legged Robots ⭐
	Learning quadrupedal locomotion over challenging terrain
	Proximal policy optimization algorithms

Lectures¶

This lecture was given in the following courses:

Fall 2024 class at Master MVA, Paris.
Fall 2024 class at Mines de Paris, Paris.
Fall 2024 class at École normale supérieure, Paris.
Fall 2023 class at Master MVA, Paris.
Fall 2023 class at Mines de Paris, Paris.
Fall 2023 class at École normale supérieure, Paris.

Discussion ¶

Feel free to post a comment by e-mail using the form below. Your e-mail address will not be disclosed.