Conversion from Least Squares to Quadratic Programming

Casting a linear least squares to a quadratic program is a common question for newcomers, who see this operation routinely mentioned or taken for granted in writing. Let us review the details of this derivation using linear algebra over normed vector spaces.

Definitions

Least Squares

A constrained weighted linear least squares (LS) problem is written as:

\begin{equation*} \begin{array}{rl} \underset{x \in \mathbb{R}^n}{\mathrm{minimize}} & \frac{1}{2} \| R x - s \|^2_W = \frac{1}{2} (R x - s)^T W (R x - s) \\ \mathrm{subject\ to} & G x \leq h \\ & A x = b \end{array} \end{equation*}

This problem seeks the vector \(x\) of optimization variables such that \(R x\) is as "close" as possible to a prescribed vector \(s\), meanwhile satisfying a set of linear inequality and equality constraints defined by the matrix-vector couples \((G, h)\) and \((A, b)\), respectively. Vector inequalities apply coordinate by coordinate. How close the two vectors \(R x\) and \(s\) are is measured via the weighted norm \(\| y \|_W = \sqrt{y^T W y}\), where the weight matrix \(W\) is positive semi-definite and usually diagonal. The overall function \(x \mapsto (1/2) \| R x - s \|^2_W\) is called the objective function of the problem.

Quadratic Programming

A quadratic program (QP) is written in standard form as:

\begin{equation*} \begin{array}{rl} \underset{x \in \mathbb{R}^n}{\mathrm{minimize}} & \frac12 x^T P x + q^T x \\ \mathrm{subject\ to} & G x \leq h \\ & A x = b \end{array} \end{equation*}

This problem seeks the vector \(x\) of optimization variables such that the quadratic objective function defined by the (positive semi-definite) matrix \(P\) and vector \(q\) is minimized, meanwhile satisfying a set of linear inequality and equality constraints defined by the matrix-vector couples \((G, h)\) and \((A, b)\), respectively. Vector inequalities apply coordinate by coordinate.

Conversion of the objective function

Equivalence relation over optimization problems

Inequality and equality constraints being identical between LS and QP, the only difference between the two lies in their respective objective functions. We will say that two objective functions \(f(x)\) and \(g(x)\) are equivalent, written as \(f(x) \propto g(x)\), if they produce the same optimum:

\begin{equation*} f(x) \propto g(x) \ \Longleftrightarrow \ \underset{x \in \mathbb{R}^n}{\mathrm{argmin}} \left\{ f(x) | G x \leq h, A x = b \right\} = \underset{x \in \mathbb{R}^n}{\mathrm{argmin}} \left\{ g(x) | G x \leq h, A x = b \right\} \end{equation*}

Informally, the symbol \(\propto\) thus means "does not change the optimum of the optimization problem". Our goal, starting from the LS objective function \(f(x) = (1/2) \| R x - s \|^2_W\), is to find a QP objective function \(g(x) = (1/2) x^T P x + q^T x\) such that \(f(x) \propto g(x)\).

Derivation of an equivalent quadratic program

Let us start from a linear least squares problem in standard form. The multiplicative constant \((1/2)\) being here by convention (and not affecting the optimum in any way), we can forget it for now and expand the weighted norm as:

\begin{equation*} \begin{array}{rcl} \| R x - s \|_W^2 & = & (R x - s)^T W (R x - s) \\ & = & x^T R^T W R x - x^T R^T W s - s^T W R x + s^T W s \\ \end{array} \end{equation*}

We can discard the constant term \(s^T W s\), which does not depend on the optimization vector \(x\) and therefore doesn't affect the optimum of the problem. We then have:

\begin{equation*} \| R x - s \|_W^2 \propto x^T R^T W R x - x^T R^T W s - s^T W R x \end{equation*}

Next, recall that \(W\) is positive semi-definite, hence in particular symmetric: \(W^T = W\). The number \(s^T W R x\) being real, it is equal to its own transpose, that is:

\begin{equation*} s^T W R x = (s^T W R x)^T = x^T R^T W^T s = x^T R^T W s \end{equation*}

Back to our rewriting of the objective function, we get to:

\begin{equation*} \| R x - s \|_W^2 \propto x^T (R^T W R) x - 2 (R^T W s)^T x \end{equation*}

Multiplying by the positive constant \((1/2)\) yields:

\begin{equation*} \frac12 \| R x - s \|_W^2 \propto \frac12 x^T P x + q^T x \end{equation*}

with \(P = R^T W R\) and \(q = - R^T W s\).

Example in Python

The qpsolvers Python module for quadratic programming provides a solve_ls function alongside its main solve_qp function. This function boils down to:

def solve_ls(R, s, G, h, A, b, lb, ub, W, solver='quadprog'):
    WR = dot(W, R)
    P = dot(R.transpose(), WR)
    q = -dot(s.transpose(), WR)
    return solve_qp(P, q, G, h, A, b, lb, ub, solver=solver)

To go further

LS and QP are not exactly equivalent (thanks to Adrien Escande for correcting this in an earlier version of this post): LS can be cast to QP, but QP can only be cast to LS when the vector \(q\) is in the image of the matrix \(P\).

The page on least squares gives more details on e.g. using the weight matrix to steer the optimum of the problem, or switching to lexicographic optimization when LS is not expressive enough. For more focus on implementation, you can also check out the more practical blog post on quadratic programming in Python.

Pages of this website are under the CC-BY 4.0 license.