This is a crash course on applying reinforcement learning to train policies that balance real legged robots. We first review the necessary basics: partially-observable Markov decision processes, value functions, the goal of reinforcement learning. We then focus on policy optimization: REINFORCE, policy gradient and proximal policy optimization (PPO). After some practical advice on training with PPO, we finally focus on techniques to train real-robot policies from simulation data: domain randomization, simulation augmentation and reward shaping.
On Linux, you can run train and run the open source PPO balancer for Upkie wheeled bipeds:
$ git clone https://github.com/upkie/upkie.git $ cd upkie $ ./tools/bazelisk run //agents/ppo_balancer:train -- --show $ ./tools/bazelisk run //agents/ppo_balancer:run
|Spinning Up in Deep Reinforcement Learning ⭐
|Learning Dexterous In-Hand Manipulation
|Learning Agile and Dynamic Motor Skills for Legged Robots ⭐
|Learning quadrupedal locomotion over challenging terrain
|Proximal policy optimization algorithms
Feel free to post a comment by e-mail using the form below. Your e-mail address will not be disclosed.