Conversion from least squares to quadratic programming

Casting a linear least squares to a quadratic program is a common question for newcomers, who see this operation routinely mentioned or taken for granted in writing. Let us review the details of this derivation using linear algebra over normed vector spaces.

Definitions

Least Squares

A constrained weighted linear least squares (LS) problem is written as:

\begin{equation*} \begin{array}{rl} \underset{x \in \mathbb{R}^n}{\mathrm{minimize}} & \frac{1}{2} \| R x - s \|^2_W = \frac{1}{2} (R x - s)^T W (R x - s) \\ \mathrm{subject\ to} & G x \leq h \\ & A x = b \end{array} \end{equation*}

This problem seeks the vector \(x\) of optimization variables such that \(R x\) is as "close" as possible to a prescribed vector \(s\), meanwhile satisfying a set of linear inequality and equality constraints defined by the matrix-vector couples \((G, h)\) and \((A, b)\), respectively. Vector inequalities apply coordinate by coordinate. How close the two vectors \(R x\) and \(s\) are is measured via the weighted norm \(\| y \|_W = \sqrt{y^T W y}\), where the weight matrix \(W\) is positive semi-definite and usually diagonal. The overall function \(x \mapsto (1/2) \| R x - s \|^2_W\) is called the objective function (a.k.a. cost function) of the problem.

Quadratic Programming

A quadratic program (QP) is written in standard form as:

\begin{equation*} \begin{array}{rl} \underset{x \in \mathbb{R}^n}{\mathrm{minimize}} & \frac12 x^T P x + q^T x \\ \mathrm{subject\ to} & G x \leq h \\ & A x = b \end{array} \end{equation*}

This problem seeks the vector \(x\) of optimization variables such that the quadratic objective function defined by the (positive semi-definite) matrix \(P\) and vector \(q\) is minimized, meanwhile satisfying a set of linear inequality and equality constraints defined by the matrix-vector couples \((G, h)\) and \((A, b)\), respectively. Vector inequalities apply coordinate by coordinate.

Equivalence between optimization problems

Let us define formally what it means to "cast" a least squares to a quadratic program. Since they have the same inequality and equality constraints, the only difference between them lies in their objective functions. We will say that two objective functions \(f(x)\) and \(g(x)\) are equivalent, written as \(f(x) \propto g(x)\), if they have the same optimum:

\begin{equation*} f(x) \propto g(x) \ \Leftrightarrow \ \underset{x \in \mathbb{R}^n}{\mathrm{argmin}} \left\{ f(x) | G x \leq h, A x = b \right\} = \underset{x \in \mathbb{R}^n}{\mathrm{argmin}} \left\{ g(x) | G x \leq h, A x = b \right\} \end{equation*}

Informally, the symbol \(\propto\) thus means "does not change the optimum of the optimization problem".

Conversion to a quadratic objective

Our goal, starting from the LS objective function \(f(x) = (1/2) \| R x - s \|^2_W\), is to find a QP objective function \(g(x) = (1/2) x^T P x + q^T x\) such that \(f(x) \propto g(x)\). The one-half multiplicative constant being here by convention (and not affecting the optimum in any way), we can forget it for now and expand the weighted norm as:

\begin{equation*} \begin{array}{rcl} \| R x - s \|_W^2 & = & (R x - s)^T W (R x - s) \\ & = & x^T R^T W R x - x^T R^T W s - s^T W R x + s^T W s \\ \end{array} \end{equation*}

We can discard the constant term \(s^T W s\) as it does not depend on the optimization vector \(x\) and therefore doesn't affect the optimum of our problem. We then have:

\begin{equation*} \| R x - s \|_W^2 \propto x^T R^T W R x - x^T R^T W s - s^T W R x \end{equation*}

Next, recall that \(W\) is positive semi-definite, hence in particular symmetric: \(W^T = W\). The number \(s^T W R x\) being real, it is equal to its own transpose, that is:

\begin{equation*} s^T W R x = (s^T W R x)^T = x^T R^T W^T s = x^T R^T W s \end{equation*}

Back to our rewriting of the objective function, we get:

\begin{equation*} \| R x - s \|_W^2 \propto x^T (R^T W R) x - 2 (R^T W s)^T x \end{equation*}

Multiplying by the positive one-half constant yields:

\begin{equation*} \frac12 \| R x - s \|_W^2 \propto \frac12 x^T P x + q^T x \end{equation*}

with \(P = R^T W R\) and \(q = - R^T W s\).

Example in Python

The qpsolvers Python module for quadratic programming provides a solve_ls function alongside its main solve_qp function. This function boils down to:

def solve_ls(R, s, G, h, A, b, lb, ub, W, solver='quadprog'):
    WR = dot(W, R)
    P = dot(R.transpose(), WR)
    q = -dot(s.transpose(), WR)
    return solve_qp(P, q, G, h, A, b, lb, ub, solver=solver)

To go further

LS and QP are not exactly equivalent (thanks to Adrien Escande for correcting this in an earlier version of this post): LS can be cast to QP, but QP can only be cast to LS when the vector \(q\) is in the image of the matrix \(P\).

The page on least squares gives more details on e.g. using the weight matrix to steer the optimum of the problem, or switching to lexicographic optimization when LS is not expressive enough. For more focus on implementation, you can also check out the more practical blog post on quadratic programming in Python.

© Stéphane Caron — All content on this website is licensed under the CC BY 4.0 license.
π