Reinforcement learning for legged robots

Stéphane Caron. Fall 2024 class at Master MVA, Mines de Paris and École normale supérieure, Paris.

Abstract¶

This is a crash course on applying reinforcement learning to train policies that balance real legged robots. We first review the necessary basics: partially-observable Markov decision processes, value functions, the goal of reinforcement learning. We then focus on policy optimization: REINFORCE, policy gradient and proximal policy optimization (PPO). We finally focus on techniques to train real-robot policies from simulation data: domain randomization, simulation augmentation, teacher-student distillation, reward shaping, ...

Content¶

	Slides
	Source of teaching material (CC-BY-4.0 license)

Example¶

Upkie robot balancing in simulation and in the real world

On Linux, you can run train and run the open source PPO balancer for Upkie wheeled bipeds:

$ git clone https://github.com/upkie/ppo_balancer.git
$ cd ppo_balancer
$ conda create -f environment.yaml
$ conda activate ppo_balancer
$ make show_training

References¶

	Spinning Up in Deep Reinforcement Learning ⭐
	Learning Dexterous In-Hand Manipulation
	Learning Agile and Dynamic Motor Skills for Legged Robots ⭐
	Learning quadrupedal locomotion over challenging terrain
	Proximal policy optimization algorithms

Discussion ¶

Feel free to post a comment by e-mail using the form below. Your e-mail address will not be disclosed.