This is a guest post from Wilson Jallet. Check out Wilson's blog for more optimal-control posts!
Most control loops used in real-world systems are simple feedback loops proportional to the error, its derivative or its integral (this is called PID control). However, this kind of control can exhibit undesirable behavior such as oscillations or failing to converge to a given setpoint quickly, if at all. More complex systems such robots, satellites or cars can come with precise performance requirements. One way to specify such performance is to define a mathematical objective function, and construct control actions that are optimal with respect to this objective (this is called optimal control). In this blog post, we'll recap how this can be done on the most basic system in control theory, the Linear-quadratic regulator (LQR), which is analogous to the classic Kalman filter in signal processing.
Linear-quadratic regulator
The goal is to control a linear time-invariant dynamical system:
xt+1=Axt+But
where the state is xt∈Rn and the control action is ut∈Rk, which can e.g. model motor torques in robotic systems. The objective is to find a sequence of controls u0,…,uT−1.
We define a running cost:
C(x,u)=21x⊤Qx+21u⊤Ru,
and a terminal cost:
Cf(x)=21x⊤Qfx.
Both are quadratic functions in the state x and control u. We further assume Q is a semidefinite positive matrix and R is positive definite, making the overall cost function strongly convex in the control u. This is the most basic form of the LQR problem: more complications can be introduced, such as state-control cross terms and linear terms in the cost functions, or making the dynamics affine rather than linear in (xt,ut).
The optimal control problem is written as a constrained optimization problem:
u0,…,uTmins.t. t=0∑T−1C(xt,ut)+Cf(xT) xt+1=Axt+But
One way we can solve this problem is by writing the corresponding Karush-Kuhn-Tucker (KKT) conditions. Looking at the KKT conditions introduces the co-state equations λt=A⊤λt+1 for the Lagrange multipliers seen in Pontryagin's minimum principle when solving continuous-time problems, as we will see below. But let us first look at the canonical way of solving this: dynamic programming.
Dynamic programming
In dynamic programming, we introduce the optimal cost-to-go or value function:
Vt(x)=ut,ut+1,…,uTminτ=t∑T−1C(xτ,uτ)+Cf(xT),where xt=x
We can then show that the sequence of functions (Vt)t obeys a dynamic programming principle, Bellman's equation:
Vt(x)=uminC(x,u)+Vt(Ax+Bu).
This equation can be solved by noticing that VT(x)=21x⊤PTx with PT=Qf and introducing the ansatz Vt(x)=21x⊤Ptx. This is an educated guess, as quadratics are closed under (partial) minimization. Plugging this into the Bellman equation yields:
Vt(x)=umin21x⊤(Q+A⊤Pt+1A)x+21u⊤(R+B⊤Pt+1B)u+u⊤B⊤Pt+1Ax.
The optimum of this problem is u=Ktx, where
Kt=−(R+B⊤Pt+1B)−1B⊤Pt+1A
is called the feedback gain matrix. The cost to go matrix then satisfies:
Pt=Q+A⊤Pt+1A−Kt⊤B⊤Pt+1A=Q+(A−BKt)⊤Pt+1A.
What about KKT conditions?
The equations above can still be derived using the KKT conditions of the control problem, seeing it as a quadratic program over the variables x=(x1,…,xT) and u=(u0,…,uT−1). Its Lagrangian reads:
L(x,u,λ)=t=0∑T−121ut⊤Rut+21xt⊤Qxt+λt+1⊤(xt+1−Axt−But)+21xT⊤QfxT
Its associated KKT conditions boil down to:
λtλTut=A⊤λt+1−Qxt, t=1,…,T−1=−QfxT=B⊤λt+1
Notice at least semi-definiteness is required to write these KKT conditions, and positiveness is required from R to solve for ut and get a well-defined solution. The Lagrange multipliers λt are also called co-states. This result is a discrete equivalent to Pontryaguin's minimum principle, that is established in continuous-time control literature.
Derivation and equivalence to dynamic programming
We can introduce a co-state gain matrix Mt such that λt=−Mtxt, and the feedback ut=Ktxt. The terminal co-state matrix is MT=Qf.
Let's establish a recurrence equation for these matrices: starting from the KKT conditions:
λt=A⊤λt+1−Qxt=−A⊤Mt+1xt+1−Qxt
It follows that:
λt=−(A⊤Mt+1A+Q)xt−A⊤Mt+1But=−(A⊤Mt+1A+Q−A⊤Mt+1BKt)xt
That is, Mt=Q+A⊤Mt+1(A−BKt).
Also, Rut=B⊤Mt+1xt+1=−B⊤Mt+1(Axt+But), thus:
(R+B⊤Mt+1B)ut=−B⊤Mt+1A⇒Kt=−(R+B⊤Mt+1B)−1B⊤Mt+1A
The sequence of co-state and control feedback matrices (Mt)t and (Kt)t satisfy the same recurrence equations as the Riccati and feedback matrices of the dynamic programming approach, and are thus identical.
To go further
Optimal control of the linear-quadratic regulator is covered in more detail in the book Dynamic programming and optimal control (Bertsekas, 2012). To go beyond this simple example, you can e.g. find on Wilson's blog an introduction to differential dynamic programming (DDP) as well as a deeper look into the Hamilton-Jacobi-Bellman equation in optimal control. You could also ask ChatGPT to implement the LQR 🤭
Discussion
Feel free to post a comment by e-mail using the form below. Your e-mail address will not be disclosed.