Abstract¶
This is a crash course on applying reinforcement learning to train policies that balance real legged robots. We first review the necessary basics: partially-observable Markov decision processes, value functions, the goal of reinforcement learning. We then focus on policy optimization: REINFORCE, policy gradient and proximal policy optimization (PPO). We finally focus on techniques to train real-robot policies from simulation data: domain randomization, simulation augmentation, teacher-student distillation, reward shaping, ...
Content¶
![]() |
Slides |
![]() |
Slides source (CC-BY-4.0 license) |
Example¶

On Linux, you can run train and run the open source PPO balancer for Upkie wheeled bipeds:
$ git clone https://github.com/upkie/ppo_balancer.git $ cd ppo_balancer $ conda create -f environment.yaml $ conda activate ppo_balancer $ make show_training
References¶
Lectures¶
This lecture was given in the following courses:
- Fall 2024 class at Master MVA, Paris.
- Fall 2024 class at Mines de Paris, Paris.
- Fall 2024 class at École normale supérieure, Paris.
- Fall 2023 class at Master MVA, Paris.
- Fall 2023 class at Mines de Paris, Paris.
- Fall 2023 class at École normale supérieure, Paris.
Discussion ¶
Feel free to post a comment by e-mail using the form below. Your e-mail address will not be disclosed.