Biped Stabilization by Linear Feedback of the VariableHeight Inverted Pendulum Model
Abstract
The variableheight inverted pendulum (VHIP) model enables a new balancing strategy by height variations of the center of mass, in addition to the wellknown ankle strategy. We propose a biped stabilizer based on linear feedback of the VHIP that is simple to implement, coincides with the stateoftheart for small perturbations and is able to recover from larger perturbations thanks to this new strategy. This solution is based on "besteffort" pole placement of a 4D divergent component of motion for the VHIP under input feasibility and state viability constraints. We complement it with a suitable wholebody admittance control law and test the resulting stabilizer on the HRP\(4\) humanoid robot.
Videos
Chatty Presentation
TL;DW
Content
Paper  
Slides  
Presentation given at ICRA 2020  
Presentation given at JRL on 29 October 2019  
Source code of the controller  
10.1109/ICRA40945.2020.9196715 
BibTeX
@inproceedings{caron2020icra,
title = {Biped Stabilization by Linear Feedback of the VariableHeight Inverted Pendulum Model},
author = {Caron, St{\'e}phane},
booktitle = {IEEE International Conference on Robotics and Automation},
url = {https://hal.archivesouvertes.fr/hal02289919},
year = {2020},
month = may,
doi = {10.1109/ICRA40945.2020.9196715},
}
Discussion
Thanks to all those who have contributed to the conversation so far. Feel free to leave a reply using the form below, or subscribe to the Discussion's atom feed to stay tuned.

Attendee #1
Posted on
When you use the QP, this considers only the instantaneous error, not along a time horizon, is this correct?

Stéphane
Posted on
With DCMs, we are looking at an infinitetime horizon, with the DCM converging to its target value only as time goes to infinity. But this is similar to the infinitehorizon linear quadratic regulator: even though we look at an infinitetime horizon, the optimal feedback control law that we get is only based on the instantaneous error.


Attendee #2
Posted on
How can we generate reference trajectories?

Stéphane
Posted on
There are several solutions:
 The seminal work by Englsberger et al. (2015) provides a closedform solution, which is also handy for e.g. step timing adaption.
 In this work, I used a linear model predictive control trajectory optimization during experiments with HRP4 (see the C++ implementation), that it to say, reference trajectories were simply based on the linear inverted pendulum model. Note that this is only for walking. While standing, the reference has a constant center of mass position \(c^d\) and \(\dot{c}^d = 0\).
 We can also generate VHIP references using the CaptureProblemSolver, which is a custom sequential quadratic programming (SQP) implementation tailored to this model. The algorithm is described here.
 We can also use a general optimal control framework to cast the full trajectory generation problem. Examples of such frameworks today include CasADi, Crocoddyl and tvm.


Attendee #3
Posted on
You mention a reference trajectory, but your experiment does not seem to follow a particular trajectory. In the context of balancing, what kind of reference trajectory should we care about?

Stéphane
Posted on
Yes, in both the pymanoid example and HRP4 experiment the reference trajectory is \(c^d = \mathit{constant}\) and \(\dot{c}^d = 0\) (with the corresponding \(\lambda^d\) and \(r^d\) computed for static equilibrium). For the balance controller there is no concept of "standing" or "walking", as you can see in the following block diagram: Balance Control consists of Reduced Model Tracking and Force Control, while the switch between standing and walking happens in Trajectory Generation.
See also the block diagram for balance control.


Attendee #4
Posted on
In the video, why do you only formulate desired error dynamics a spring system, instead of spring and damper?

Stéphane
Posted on
Actually Morisawa et al. (2012) (reference 3 in the video) has PID desired error dynamics. When I cite it in the video, I'm only referring to the fact that pole placement means definiing your desired error dynamics.
In this work we focus on the proportional (spring) term because it's the most significant in practice. The derivative (damping) term has usually a relatively low gain because state estimators tend to yield noisy DCM derivatives. On HRP4 we set it to zero, and on HRP2Kai we set it to a small value (not zero because the stiffness of its flexible joint between sole and ankle sole is lower than for HRP4, and damping helps compensate the ensuing vibrations at high proportional gain). This might evolve if somebody manages to design a smoother DCM state estimator ;)


Attendee #5
Posted on
Let’s say we are in the wild searching for more “ducks” (i.e., higherorder generalizations of the DCM). What characteristics must these quantities have to be considered a DCM?

Stéphane
Posted on
I see no definitive answer, but I'd venture to say:
 They need to be "divergent". The trajectory of a DCM \(\xi\) is unbounded unless the input \(u\) satisfies a specific (dynamicsdependent) condition (the boundedness condition). Alternatively, we can take inspiration from Coppel and lowerbound this unboundedness by exponentials, but we may need to look farther than exponential in general (e.g. in the 4D DCM the \(\dot{\omega} = \omega^2\) component diverges superexponentially).
 They should decouple our secondorder system into two consecutive firstorder systems. This feels less like a property of the system and more like something we want. Here, when we choose to use Mike Hopkins's timevarying DCM, we are making sure CoM dynamics depend only on the DCM (\(\dot{c} = \omega (\xi  c)\)). Secondly, we make sure the DCM depends only on the contact wrench input to get a decoupling similar to the LIP one (replacing "ZMP" by "contact wrench" and "capture point" by "4D DCM").


Attendee #6
Posted on
It seems like the 4D DCM in this case is a consequence of your control parameterization in the sense that if you didn’t have a virtual stiffness lambda, you wouldn’t have the Riccati equation pop out for \(\omega\). Do you agree?
Where I’m pointing with this is that you could alternatively consider variation dynamics for the CoM directly, with some other parameterization for the forces. For instance if you just used the force as a control input, the CoM dynamics would be linear themselves. And so I’m curious what advantages we have by essentially lifting the CoM dynamics to this higher dimension with the addition of \(\omega\). Is it that constraints are easier to enforce?

Stéphane
Posted on
The main advantage is that we get viability. That is, not only short term, but also longterm force feasibility.
Looking back at the LIP, the main driver to go this way is that it extends constraints from shortterm input feasibility to longterm state viability. If we take a feedback controller using the force as control input \(F = k_p \Delta c + k_d \Delta \dot{c}\), we know from Sugihara (2009) that \(k_d = \frac{k_p}{\omega}  m \omega\) (minus \(m \omega\) because we use the force) is the choice that yields maximum capturability (i.e. the linear controller with the largest basin of attraction) under ZMP constraints. There we get our connection from shortterm constraints to longterm: ZMP inequality constraints prompt us to express dynamics with the ZMP as input, whereupon a constant \(\omega\) appears. This constant determines the feedback gains that maximize capturability.
Intuitively, the reason why this controller catches all capturable states is that it spends no input trying to control the CCM; everything goes to the DCM. Although we don't know yet a proof of maximum capturability for 4D DCM feedback, the controller behaves with similar parsimony: it only spends input on the DCM, and adds height variations only when it has to.
Going back to the first part of your question, the benefit of this control parameterization is that we optimize infinitetime horizon trajectories in a quadratic program (under variation dynamics, no guarantee that we maximize capturability for the full nonlinear system). "Solving infinitetime horizon" is a practical way to extend input feasibility constraints to state viability ;) If we use the force as control input, our dynamics are simplified, but force constraints become CoMdependent and we can't predict what will happen to them over an infinite horizon (the set of feasible forces may vanish).
