Reinforcement learning for legged robots

Stéphane Caron. Fall 2023 class at Master MVA and École normale supérieure, Paris.


This is a crash course on applying reinforcement learning to train policies that balance real legged robots. We first review the necessary basics: partially-observable Markov decision processes, value functions, the goal of reinforcement learning. We then focus on policy optimization: REINFORCE, policy gradient and proximal policy optimization (PPO). After some practical advice on training with PPO, we finally focus on techniques to train real-robot policies from simulation data: domain randomization, simulation augmentation and reward shaping.


Upkie robot balancing in simulation and in the real world

On Linux, you can run train and run the open source PPO balancer for Upkie wheeled bipeds:

$ git clone
$ cd upkie
$ ./tools/bazelisk run //agents/ppo_balancer:train -- --show
$ ./tools/bazelisk run //agents/ppo_balancer:run


Feel free to post a comment by e-mail using the form below. Your e-mail address will not be disclosed.

📝 You can use Markdown with $\LaTeX$ formulas in your comment.

By clicking the button below, you agree to the publication of your comment on this page.

Opens your e-mail client.