Casting a linear least squares to a quadratic program is a common question for newcomers, who see this operation routinely mentioned or taken for granted in writing. Let us review the details of this derivation using linear algebra over normed vector spaces.
Definitions¶
Least Squares¶
A constrained weighted linear least squares (LS) problem is written as:
This problem seeks the vector of optimization variables such that is as "close" as possible to a prescribed vector , meanwhile satisfying a set of linear inequality and equality constraints defined by the matrix-vector couples and , respectively. Vector inequalities apply coordinate by coordinate. How close the two vectors and are is measured via the weighted norm , where the weight matrix is positive semi-definite and usually diagonal. The overall function is called the objective function (a.k.a. cost function) of the problem.
Quadratic Programming¶
A quadratic program (QP) is written in standard form as:
This problem seeks the vector of optimization variables such that the quadratic objective function defined by the (positive semi-definite) matrix and vector is minimized, meanwhile satisfying a set of linear inequality and equality constraints defined by the matrix-vector couples and , respectively. Vector inequalities apply coordinate by coordinate.
Equivalence between optimization problems¶
Let us define formally what it means to "cast" a least squares to a quadratic program. Since they have the same inequality and equality constraints, the only difference between them lies in their objective functions. We will say that two objective functions and are equivalent, written as , if they have the same optimum:
Informally, the symbol thus means "does not change the optimum of the optimization problem".
Conversion to a quadratic objective¶
Our goal, starting from the LS objective function , is to find a QP objective function such that . The one-half multiplicative constant being here by convention (and not affecting the optimum in any way), we can forget it for now and expand the weighted norm as:
We can discard the constant term as it does not depend on the optimization vector and therefore doesn't affect the optimum of our problem. We then have:
Next, recall that is positive semi-definite, hence in particular symmetric: . The number being real, it is equal to its own transpose, that is:
Back to our rewriting of the objective function, we get:
Multiplying by the positive one-half constant yields:
with and .
Example in Python¶
The qpsolvers Python module for
quadratic programming provides a solve_ls
function alongside its main
solve_qp
function. For dense problems, this function boils down to:
def solve_dense_ls(R, s, G, h, A, b, lb, ub, W, solver): WR = dot(W, R) P = dot(R.transpose(), WR) q = -dot(s.transpose(), WR) return solve_qp(P, q, G, h, A, b, lb, ub, solver=solver)
The full function in the library is only slightly more complex to handle combinations of dense and sparse matrices.
Sparse conversion¶
The above conversion is not optimal for sparse problems, as forming the matrix will likely lead to a fully-dense matrix even when is sparse. In that situation, a better strategy is to define a new variable:
And formulate the following quadratic program:
This alternative conversion maintains sparsity at the expense of increasing the number of optimization variables from to , where is the number of rows of . It will likely perform better than the previous conversion on most sparse problems, but it may perform worse in some cases, for instance if has many more rows than columns. Consider the structure of the problem at hand to decide on which conversion strategy to follow.
Example in Python¶
For sparse problems, the solve_ls
function of qpsolvers boils down to:
def solve_sparse_ls(R, s, G, h, A, b, lb, ub, W, solver): m, n = R.shape zeros_nn = spa.csc_matrix((n, n)) eye_m = spa.eye(m, format="csc") P = spa.block_diag([zeros_nn, W], format="csc") q = np.zeros(n + m) G = spa.hstack([G, spa.csc_matrix((G.shape[0], m))], format="csc") A = spa.hstack([A, spa.csc_matrix((A.shape[0], m))], format="csc") Rx_minus_y = spa.hstack([R, -eye_m], format="csc") A = spa.vstack([A, Rx_minus_y], format="csc") b = np.hstack([b, s]) lb = np.hstack([lb, np.full((m,), -np.inf)]) ub = np.hstack([ub, np.full((m,), np.inf)]) xy = solve_qp(P, q, G, h, A, b, lb, ub, solver=solver) return xy[-m:] if xy is not None else None
The full function in the library is more complex to handle combinations of defined and undefined inputs, dense and sparse matrices.
To go further¶
LS and QP are not exactly equivalent: LS can be cast to QP, but QP can only be cast to LS when the vector is in the image of the matrix .
The post on least squares gives more details on e.g. using the weight matrix to steer the optimum of the problem, or switching to lexicographic optimization when LS is not expressive enough. For more focus on implementation, you can also check out the more practical blog post on quadratic programming in Python.
Thanks to Adrien Escande for correcting that LS and QP are not exactly equivalent in an earlier version of this post.
Thanks to Brendan O'Donoghue for pointing out the limitation of the original dense conversion and suggesting the sparse conversion alternative.
Discussion ¶
You can subscribe to this Discussion's atom feed to stay tuned.
-
-
TH
Posted on
The line just above "Example in Python" in Section "Conversion to a quadratic objective" should have been , i.e. without the minus sign?
-
Stéphane
Posted on
That formula seems correct. Here is a more detailed derivation, feel free to point out any step where you see a mistake:
Since is a scalar, it verifies and therefore .
From which we can identify and finally .
-
-
GA
Posted on
Is there a reason why one would prefer one of the forms of the OLS problem to the other? Would writing OLS as a quadratic problem make it easier to solve?
-
Stéphane
Posted on
QP is more general than least squares (LS can be cast to QP, but QP can only be cast to LS when the vector in the objective function lies in the image of the matrix ), hence casting the problem as a QP does not make it easier to solve. One reason for doing so is purely practical: many off-the-shelf solvers expect quadratic programs as inputs (for instance there are 16 of them listed in qpsolvers), but I don't know many solvers accepting inputs in least squares format. Feel free to share links if you encounter some 🙂
-
Feel free to post a comment by e-mail using the form below. Your e-mail address will not be disclosed.