Biped Stabilization by Linear Feedback of the Variable-Height Inverted Pendulum Model

Stéphane Caron. Currently presented at ICRA 2020.


The variable-height inverted pendulum (VHIP) model enables a new balancing strategy by height variations of the center of mass, in addition to the well-known ankle strategy. We propose a biped stabilizer based on linear feedback of the VHIP that is simple to implement, coincides with the state-of-the-art for small perturbations and is able to recover from larger perturbations thanks to this new strategy. This solution is based on "best-effort" pole placement of a 4D divergent component of motion for the VHIP under input feasibility and state viability constraints. We complement it with a suitable whole-body admittance control law and test the resulting stabilizer on the HRP-4 humanoid robot.


slack Slack channel at ICRA 2020


Chatty Presentation



  title = {Biped Stabilization by Linear Feedback of the Variable-Height Inverted Pendulum Model},
  author = {Caron, St{\'e}phane},
  booktitle = {IEEE International Conference on Robotics and Automation},
  url = {},
  year = {2020},
  month = may,


Let's talk on the ICRA 2020 Slack channel for this work! I will report the main points of our discussions below. See also the discussion we had following the presentation on divergent components of motion at JRL.

When you use the QP, this considers only the instantaneous error, not along a time horizon, is this correct?

With DCMs, we are looking at an infinite-time horizon, with the DCM converging to its target value only as time goes to infinity. But this is similar to the infinite-horizon linear quadratic regulator: even though we look at an infinite-time horizon, the optimal feedback control law that we get is only based on the instantaneous error.

How can we generate reference trajectories?

There are several solutions:

  • The seminal work by Englsberger et al. (2015) provides a closed-form solution, which is also handy for e.g. step timing adaption.
  • In this work, I used a linear model predictive control trajectory optimization during experiments with HRP-4 (see the C++ implementation), that it to say, reference trajectories were simply based on the linear inverted pendulum model. Note that this is only for walking. While standing, the reference has a constant center of mass position \(c^d\) and \(\dot{c}^d = 0\).
  • We can also generate VHIP references using the CaptureProblemSolver, which is a custom sequential quadratic programming (SQP) implementation tailored to this model. The algorithm is described here.
  • We can also use a general optimal control framework to cast the full trajectory generation problem. Examples of such frameworks today include CasADi, Crocoddyl and tvm.

You mention a reference trajectory, but your experiment does not seem to follow a particular trajectory. In the context of balancing, what kind of reference trajectory should we care about?

Yes, in both the pymanoid example and HRP-4 experiment the reference trajectory is \(c^d = \mathit{constant}\) and \(\dot{c}^d = 0\) (with the corresponding \(\lambda^d\) and \(r^d\) computed for static equilibrium). For the balance controller there is no concept of "standing" or "walking", as you can see in the following block diagram: Balance Control consists of Reduced Model Tracking and Force Control, while the switch between standing and walking happens in Trajectory Generation.
Block diagram illustrating balance control for legged robots.

In the video, why do you only formulate desired error dynamics a spring system, instead of spring and damper?

Actually Morisawa-san et al. (2012) (reference 3 in the video) has PID desired error dynamics. When I cite it in the video, I'm only referring to the fact that pole placement means definiing your desired error dynamics.

In this work we focus on the proportional (spring) term because it's the most important in practice. The derivative (damping) term has usually a (very) low gain because state estimators tend to yield noisy DCM derivatives. On HRP-4 we set it to zero, and on HRP-2Kai we set it to a small value (not zero because the stiffness of its flexible joint between sole and ankle sole is lower than for HRP-4, and damping helps compensate the ensuing vibrations at high P gain). This might evolve if somebody manages to design a good DCM state estimator ;)

Let’s say we are in the wild searching for more “ducks” (i.e., higher-order generalizations of the DCM). What characteristics must these quantities have to be considered a DCM?

I see no definitive answer, but I'd venture to say:

  1. They need to be "divergent". The trajectory of a DCM \(\xi\) is unbounded unless the input \(u\) satisfies a specific (dynamics-dependent) condition (the boundedness condition). Alternatively, we can take inspiration from Coppel and lower-bound this unboundedness by exponentials, but we may need to look farther than exponential in general (e.g. in the 4D DCM the \(\dot{\omega} = \omega^2\) component diverges super-exponentially).
  2. They should decouple our second-order system into two consecutive first-order systems. This feels less like a property of the system and more like something we want. Here, when we choose to use Mike Hopkins's time-varying DCM, we are making sure CoM dynamics depend only on the DCM (\(\dot{c} = \omega (\xi - c)\)). Secondly, we make sure the DCM depends only on the contact wrench input to get a decoupling similar to the LIP case (replacing "ZMP" by "contact wrench" and "capture point" by "4D DCM"):
Decoupling of second-order dynamics into two first-order system.

It seems like the 4D DCM in this case is a consequence of your control parameterization in the sense that if you didn’t have a virtual stiffness lambda, you wouldn’t have the Riccati equation pop out for omega. Do you agree?

Where I’m pointing with this is that you could alternatively consider variation dynamics for the CoM directly, with some other parameterization for the forces. For instance if you just used the force as a control input, the CoM dynamics would be linear themselves. And so I’m curious what advantages we have by essentially lifting the CoM dynamics to this higher dimension with the addition of omega. Is it that constraints are easier to enforce?

I totally agree.

Looking back at the LIP, the main driver to go this way is that it extends constraints from short-term input feasibility to long-term state viability. If we take a feedback controller using the force as control input \(F = k_p \Delta c + k_d \Delta \dot{c}\), we know from Sugihara (2009) that \(k_d = \frac{k_p}{\omega} - m \omega\) (minus \(m \omega\) because we use the force) is the choice that yields maximum capturability (i.e. the linear controller with the largest basin of attraction) under ZMP constraints. There we get our connection from short-term constraints to long-term: ZMP inequality constraints prompt us to express dynamics with the ZMP as input, whereupon a constant \(\omega\) appears. This constant determines the feedback gains that maximize capturability.

Intuitively, the reason why this controller catches all capturable states is that it spends no input trying to control the CCM; everything goes to the DCM. Although we don't know yet a proof of maximum capturability for 4D DCM feedback, the controller behaves with similar parsimony: it only spends input on the DCM, and adds height variations only when it has to.

Going back to the first part of your question, the benefit of this control parameterization is that we optimize infinite-time horizon trajectories in a quadratic program (under variation dynamics, no guarantee that we maximize capturability for the full nonlinear system). "Solving infinite-time horizon" is a practical way to extend input feasibility constraints to state viability ;-) If we use the force as control input, our dynamics are simplified, but as you point out force constraints become CoM-dependent and we might have a hard time solving over an infinite horizon.

Pages of this website are under the CC-BY 4.0 license.