Simple linear regression with online updates

Simple linear regression is a particular case of linear regression where we assume that the output yR\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} y \in \mathbb{R} is an affine transform of the input xR\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} x \in \mathbb{R}:

y=α+βx\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} y = \alpha + \beta x

We gather observations (xi,yi)\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} (x_i, y_i) and search for the parameters (α,β)\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} (\alpha, \beta) that minimize a loss function such as the residual sum of squares. Graphically, we are fitting the line that minimizes the sum of squared distances to every point:

Simple linear regression

Compared to a full-fledged linear regression, simple linear regression has a number of properties we can use for faster computations and online updates. In this post, we will see how to implement it in Python, first with a fixed dataset, then with online updates, and finally with a sliding-window variant where recent observations have more weight than older ones.

Linear regression

Let us start with a vanilla solution using an ordinary linear regression solver. We stack our inputs into a vector x=[x1,,xN]\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \bfx = [x_1, \ldots, x_N]^\top and their corresponding outputs into y=[y1,,yN]\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \bfy = [y_1, \ldots, y_N]^\top. Our objective function is the residual sum of squares:

minimizeα,β yα1βx22\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \underset{\alpha, \beta}{\mathrm{minimize}} \ \| \bfy - \alpha \bfone - \beta \bfx \|_2^2

where 1\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \bfone is the column vector filled with ones that has the same dimension as x\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \bfx and y\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \bfy. The parameter α\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \alpha is called the intercept while β\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \beta is the slope (or coefficient).

Using a machine learning library

For our tests, we will generate data using the same model as for the figure above:

import numpy as np

def generate_dataset(size=10000, alpha=1.5, beta=3.15):
    x = np.random.random(size)
    y = alpha + beta * x + np.random.normal(0, 0.4, size=x.shape)
    return (x, y)

Any machine learning library will happily estimate the intercept and coefficients of a linear model from such observation data. Try it out with your favorite one! Here is for example how linear regression goes with scikit-learn:

import sklearn.linear_model

def simple_linear_regression_sklearn(x, y):
    regressor = sklearn.linear_model.LinearRegression()
    regressor.fit(x.reshape(-1, 1), y)
    intercept = regressor.intercept_
    slope = regressor.coef_[0]
    return (intercept, slope)

Yet, our problem is simpler than a general linear regression. What if we solved the least squares problem directly?

Using a least squares solver

We can rewrite our objective function in matrix form as:

yα1βx22=[1x][αβ]y22=Rxy22\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \| \bfy - \alpha \bfone - \beta \bfx \|_2^2 = \left\| \begin{bmatrix} \bfone & \bfx \end{bmatrix} \begin{bmatrix} \alpha \\ \beta \end{bmatrix} - \bfy \right\|_2^2 = \| \bfR \bfx - \bfy \|_2^2

with R\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \bfR the 2×N\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} 2 \times N matrix with a first column of ones and x\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \bfx as second column. We can give these matrices to any least squares solver, for instance in Python using the solve_ls function from qpsolvers:

import qpsolvers

def simple_linear_regression_ls(x, y, solver="quadprog"):
    shape = (x.shape[0], 1)
    R = np.hstack([np.ones(shape), x.reshape(shape)])
    intercept, slope = qpsolvers.solve_ls(R, s=y, solver=solver)
    return (intercept, slope)

This results in a faster implementation than the previous one, for instance on my machine:

In [1]: x, y = generate_dataset(1000 * 1000)

In [2]: %timeit simple_linear_regression_sklearn(x, y)
21.2 ms ± 22.8 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [3]: %timeit simple_linear_regression_ls(x, y, solver="quadprog")
11.4 ms ± 73.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

The improvement may be due to avoiding extra conversions inside scikit-learn, or because the solver it uses internally is less efficient than quadprog for this type of problems. It is more significant for small datasets, and ultimately settles around 2×\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} 2\times for large datasets. Yet, we are still relying on a general least squares solution (we can add columns to R\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \bfR for more input variables), which means we are not squeezing out all the improvements we can get from simple linear regression.

Simple linear regression

Let us expand our objective function using the vector dot product:

y(α1+βx)22=((α1+βx)y)((α1+βx)y)=α1+βx222y(α1+βx)+(yy)=α2(11)+2αβ(1x)+β2(xx)2(1y)α2(xy)β+(yy)\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \begin{align*} & \| \bfy - (\alpha \bfone + \beta \bfx) \|_2^2 \\ & = ((\alpha \bfone + \beta \bfx) - \bfy) \cdot ((\alpha \bfone + \beta \bfx) - \bfy) \\ & = \| \alpha \bfone + \beta \bfx \|_2^2 - 2 \bfy \cdot (\alpha \bfone + \beta \bfx) + (\bfy \cdot \bfy) \\ & = \alpha^2 (\bfone \cdot \bfone) + 2 \alpha \beta (\bfone \cdot \bfx) + \beta^2 (\bfx \cdot \bfx) - 2 (\bfone \cdot \bfy) \alpha - 2 (\bfx \cdot \bfy) \beta + (\bfy \cdot \bfy) \end{align*}

Since this function is strictly convex, its global minimum is the (unique) critical point where its gradient vanishes:

αy(α1+βx)22=0βy(α1+βx)22=0\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \begin{align*} \frac{\partial}{\partial \alpha} \| \bfy - (\alpha \bfone + \beta \bfx) \|_2^2 & = 0 & \frac{\partial}{\partial \beta} \| \bfy - (\alpha \bfone + \beta \bfx) \|_2^2 & = 0 \end{align*}

Using the dot product expansion, this condition becomes:

2(11)α+2(1x)β+02(1y)+0+0=00+2(1x)α+2(xx)β02(xy)+0=0\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \begin{align*} 2 (\bfone \cdot \bfone) \alpha + 2 (\bfone \cdot \bfx) \beta + 0 - 2 (\bfone \cdot \bfy) + 0 + 0 & = 0 \\ 0 + 2 (\bfone \cdot \bfx) \alpha + 2 (\bfx \cdot \bfx) \beta - 0 - 2 (\bfx \cdot \bfy) + 0 & = 0 \end{align*}

We end up with a 2×2\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} 2 \times 2 system of linear equations:

(11)α+(1x)β=(1y)(1x)α+(xx)β=(xy)\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \begin{align*} (\bfone \cdot \bfone) \alpha + (\bfone \cdot \bfx) \beta & = (\bfone \cdot \bfy) \\ (\bfone \cdot \bfx) \alpha + (\bfx \cdot \bfx) \beta & = (\bfx \cdot \bfy) \end{align*}

We know its analytical solution by (manual Gaussian elimination or remembering the formula for) Cramer's rule:

α=(1y)(xx)(xy)(1x)(11)(xx)(1x)2β=(11)(xy)(1x)(1y)(11)(xx)(1x)2\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \begin{align*} \alpha & = \frac{(\bfone \cdot \bfy) (\bfx \cdot \bfx) - (\bfx \cdot \bfy) (\bfone \cdot \bfx)}{(\bfone \cdot \bfone) (\bfx \cdot \bfx) - (\bfone \cdot \bfx)^2} & \beta & = \frac{(\bfone \cdot \bfone) (\bfx \cdot \bfy) - (\bfone \cdot \bfx) (\bfone \cdot \bfy)}{(\bfone \cdot \bfone) (\bfx \cdot \bfx) - (\bfone \cdot \bfx)^2} \end{align*}

The numerators and denominator of these two expressions have a statistical interpretation detailed in fitting the regression line (Wikipedia). The denominator is positive by the Cauchy-Schwarz inequality. It can be zero in the degenerate case where x=x1\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \bfx = x \bfone, that is, when all input values are equal, but we will assume that's not the case in what follows. All in all, the Python code corresponding to this solution needs only NumPy and floating-point arithmetic:

def simple_linear_regression_cramer(x, y):
    dot_1_1 = x.shape[0]  # == np.dot(ones, ones)
    dot_1_x = x.sum()  # == np.dot(x, ones)
    dot_1_y = y.sum()  # == np.dot(y, ones)
    dot_x_x = np.dot(x, x)
    dot_x_y = np.dot(x, y)
    det = dot_1_1 * dot_x_x - dot_1_x ** 2  # not checking det == 0
    intercept = (dot_1_y * dot_x_x - dot_x_y * dot_1_x) / det
    slope = (dot_1_1 * dot_x_y - dot_1_x * dot_1_y) / det
    return intercept, slope

This implementation is not only faster than the previous ones, it's also the first one where we leveraged some particular structure of our problem:

In [1]: x, y = generate_dataset(1000 * 1000)

In [2]: %timeit simple_linear_regression_ls(x, y, solver="quadprog")
11.4 ms ± 36 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [3]: %timeit simple_linear_regression_cramer(x, y)
3.72 ms ± 160 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

We can use some IPython magic to compare the three approaches we have seen so far in a quick benchmark:

sizes = sorted([k * 10 ** d for k in [2, 5, 10] for d in range(6)])
for size in sizes:
    x, y = generate_dataset(size)
    instructions = [
        "simple_linear_regression_sklearn(x, y)",
        'simple_linear_regression_ls(x, y, solver="quadprog")',
        "simple_linear_regression_cramer(x, y)",
    ]
    for instruction in instructions:
        print(instruction)  # helps manual labor to gather the numbers ;)
        get_ipython().magic(f"timeit {instruction}")

Here is the outcome in a plot (raw data in the image's alt attribute):

Benchmark of computation times for all three solutions. Here is the raw data: [(221e-6, 36.2e-6, 5.55e-6), (218e-6, 37.3e-6, 5.56e-6), (217e-6, 36.3e-6, 5.62e-6), (216e-6, 36.5e-6, 5.64e-6), (219e-6, 36.8e-6, 5.74e-6), (219e-6, 37.2e-6, 5.7e-6), (222e-6, 38.3e-6, 6.01e-6), (230e-6, 39.8e-6, 6.32e-6), (241e-6, 42.7e-6, 7.09e-6), (251e-6, 47.6e-6, 8.42e-6), (288e-6, 141e-6, 12.4e-6), (346e-6, 190e-6, 19.5e-6), (452e-6, 283e-6, 52.4e-6), (775e-6, 544e-6, 83.1e-6), (2.32e-3, 987e-6, 138e-6), (3.94e-3, 2.02e-3, 232e-6), (13.2e-3, 5.66e-3, 1.41e-3), (21.8e-3, 11.7e-3, 4.0e-3)].

We now have a reasonably efficient solution for simple linear regression over small- to medium-sized (millions of entries) datasets. All the solutions we have seen so far have been offline, meaning the dataset was generated then processed all at once. Let us now turn to the online case, where new observations (xi,yi)\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} (x_i, y_i) come one after the other.

Online simple linear regression

The dot products we use to calculate the intercept α\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \alpha and slope β\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \beta are ready for recursive updates. Suppose we already now their values for k\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} k previous observations. When a new observation (xk+1,yk+1)\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} (x_{k+1}, y_{k+1}) comes in, we can update them by:

(11)k+1=(11)k+1(1x)k+1=(1x)k+xk+1(1y)k+1=(1y)k+yk+1(xx)k+1=(xx)k+xk+12(xy)k+1=(xy)k+xk+1yk+1\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \begin{align*} (\bfone \cdot \bfone)_{k+1} & = (\bfone \cdot \bfone)_k + 1 \\ (\bfone \cdot \bfx)_{k+1} & = (\bfone \cdot \bfx)_k + x_{k+1} \\ (\bfone \cdot \bfy)_{k+1} & = (\bfone \cdot \bfy)_k + y_{k+1} \\ (\bfx \cdot \bfx)_{k+1} & = (\bfx \cdot \bfx)_k + x_{k+1}^2 \\ (\bfx \cdot \bfy)_{k+1} & = (\bfx \cdot \bfy)_k + x_{k+1} y_{k+1} \end{align*}

The same logic generalizes to a new batch of observations (xk+1,yk+1)\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} (\bfx_{k+1}, \bfy_{k+1}):

(11)k+1=(11)k+dim(xk+1)(1x)k+1=(1x)k+(1xk+1)(1y)k+1=(1y)k+(1yk+1)(xx)k+1=(xx)k+(xk+1xk+1)(xy)k+1=(xy)k+(xk+1yk+1)\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \begin{align*} (\bfone \cdot \bfone)_{k+1} & = (\bfone \cdot \bfone)_k + \mathrm{dim}(\bfx_{k+1}) \\ (\bfone \cdot \bfx)_{k+1} & = (\bfone \cdot \bfx)_k + (\bfone \cdot \bfx_{k+1}) \\ (\bfone \cdot \bfy)_{k+1} & = (\bfone \cdot \bfy)_k + (\bfone \cdot \bfy_{k+1}) \\ (\bfx \cdot \bfx)_{k+1} & = (\bfx \cdot \bfx)_k + (\bfx_{k+1} \cdot \bfx_{k+1}) \\ (\bfx \cdot \bfy)_{k+1} & = (\bfx \cdot \bfy)_k + (\bfx_{k+1} \cdot \bfy_{k+1}) \end{align*}

Keep in mind that, although we write them in vector form, dot products by 1\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \bfone are simply sums. We can wrap these update rules in a Python class for a simple linear regressor that keeps track of dot products internally, and outputs α\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \alpha and β\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \beta upon new observations:

class SimpleLinearRegressor(object):
    def __init__(self):
        self.dots = np.zeros(5)
        self.intercept = None
        self.slope = None

    def update(self, x: np.ndarray, y: np.ndarray):
        self.dots += np.array(
            [
                x.shape[0],
                x.sum(),
                y.sum(),
                np.dot(x, x),
                np.dot(x, y),
            ]
        )
        size, sum_x, sum_y, sum_xx, sum_xy = self.dots
        det = size * sum_xx - sum_x ** 2
        if det > 1e-10:  # determinant may be zero initially
            self.intercept = (sum_xx * sum_y - sum_xy * sum_x) / det
            self.slope = (sum_xy * size - sum_x * sum_y) / det

This solution is equivalent to this answer from StackOverflow. It treats all observations the same, which is pertinent for non-time series data. But we can also tweak our derivation so far to the more specific use case of sliding-window regression over time series data.

Sliding-window simple linear regression

Suppose now that single observations (xi,yi)\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} (x_i, y_i) come in succession every δt\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \delta t timesteps, and we want to estimate the instantaneous slope β\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \beta and intercept α\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \alpha of our recent data. Similarly to the exponential moving average filter, we can define a forgetting factor λ(0,1)\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \lambda \in (0, 1) as:

λ=exp(δtT)\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \lambda = \exp\left(-\frac{\delta t}{T}\right)

where T\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} T is the time constant that characterizes the span of our sliding window. Note that, by the Nyquist–Shannon sampling theorem, T\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} T has to be at least 2δt\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} 2 \delta t to avoid aliasing. We can use this forgetting factor in our objective function as a discount rate applied to previous observations:

fit(α,β,xk,yk)=i=0k1λi(ykiαβxki)2\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \mathit{fit}(\alpha, \beta, \bfx_k, \bfy_k) = \sum_{i=0}^{k-1} \lambda^{i} (y_{k-i} - \alpha - \beta x_{k-i})^2

To calculate this exponentially discounted residual sum of squares efficiently, we can modify the definition of our vectors xk\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \bfx_k, yk\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \bfy_k, and 1k\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \bfone_k, to incorporate the forgetting factor:

xk+1=[λxkxk+1]yk+1=[λykyk+1]1k+1=[λ1k1]\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \begin{align*} \bfx_{k+1} & = \begin{bmatrix} \sqrt{\lambda} \bfx_k \\ x_{k+1} \end{bmatrix} & \bfy_{k+1} & = \begin{bmatrix} \sqrt{\lambda} \bfy_k \\ y_{k+1} \end{bmatrix} & \bfone_{k+1} & = \begin{bmatrix} \sqrt{\lambda} \bfone_k \\ 1 \end{bmatrix} \end{align*}

The benefit of this approach is that the squared norms of our new vectors now correspond to the exponentially discounted residual sum of squares:

yk+1α1k+1βxk+122=(yk+1αβxk+1)2+λykα1kβxk22= =i=0kλi(yk+1iαβxk+1i)2=fit(α,β,xk+1,yk+1)\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \begin{align*} \| \bfy_{k+1} - \alpha \bfone_{k+1} - \beta \bfx_{k+1} \|_2^2 & = (y_{k+1} - \alpha - \beta x_{k+1})^2 + \lambda \| \bfy_{k} - \alpha \bfone_k - \beta \bfx_k \|_2^2 \\ & = \ \ldots \\ & = \sum_{i=0}^{k} \lambda^{i} (y_{k+1-i} - \alpha - \beta x_{k+1-i})^2 \\ & = \mathit{fit}(\alpha, \beta, \bfx_{k+1}, \bfy_{k+1}) \end{align*}

Granted, we took some liberties with the notation 1k\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \bfone_k as it's not a vector of ones any more ;-) But the expression (1kvk)\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} (\bfone_k \cdot \bfv_k) keeps the semantic of "summing the coordinates of vk\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \bfv_k", although is now in an exponentially discounted sense. In terms of dot products, our update rules translate to:

(11)k+1=λ(11)k+1(1x)k+1=λ(1x)k+xk+1(1y)k+1=λ(1y)k+yk+1(xx)k+1=λ(xx)k+xk+12(xy)k+1=λ(xy)k+xk+1yk+1\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \begin{align*} (\bfone \cdot \bfone)_{k+1} & = \lambda (\bfone \cdot \bfone)_k + 1 \\ (\bfone \cdot \bfx)_{k+1} & = \lambda (\bfone \cdot \bfx)_k + x_{k+1} \\ (\bfone \cdot \bfy)_{k+1} & = \lambda (\bfone \cdot \bfy)_k + y_{k+1} \\ (\bfx \cdot \bfx)_{k+1} & = \lambda (\bfx \cdot \bfx)_k + x_{k+1}^2 \\ (\bfx \cdot \bfy)_{k+1} & = \lambda (\bfx \cdot \bfy)_k + x_{k+1} y_{k+1} \end{align*}

This leads to a regressor class similar to the previous one:

class SlidingWindowLinearRegressor(object):
    def __init__(self, dt, time_constant):
        self.dots = np.zeros(5)
        self.intercept = None
        self.forgetting_factor = np.exp(-dt / time_constant)
        self.slope = None

    def update(self, x: float, y: float):
        self.dots *= self.forgetting_factor
        self.dots += np.array([1, x, y, x * x, x * y])
        dot_11, dot_1x, dot_1y, dot_xx, dot_xy = self.dots
        det = dot_11 * dot_xx - dot_1x ** 2
        if det > 1e-10:  # determinant is zero initially
            self.intercept = (dot_xx * dot_1y - dot_xy * dot_1x) / det
            self.slope = (dot_xy * dot_11 - dot_1x * dot_1y) / det

Let's evaluate its performance on synthetic data:

dt = 1e-3  # [s]
T = 5e-2  # [s]
sliding_window_regressor = SlidingWindowLinearRegressor(dt, T)
for t in np.arange(0., 100., dt):
    alpha = 1.5 * np.sin(0.1 * t)
    beta = 3.15 + 0.5 * np.cos(1.1 * t)
    x = np.cos(t) + np.random.normal(0, 0.05)
    y = alpha + beta * x + np.random.normal(0, 0.1)
    sliding_window_regressor.update(x, y)

This small test results in the following observations and estimates:

Estimates during sliding-window simple linear regression.

Contrary to regular simple linear regression, our filter now has a time-constant parameter T\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} T that defines the length of our sliding window. Its ideal value depends on our data. In this example, it would depend on all the inline parameters [1.5, 0.1, 3.15, 0.5, 1.1, 0.05, 0.1] we use to produce α(t),β(t)\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \alpha(t), \beta(t) and the white noise on x(t),y(t)\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} x(t), y(t). Although, mainly, it will depend on the two constants [0.1, 1.1] that define how fast α(t)\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \alpha(t) and β(t)\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \beta(t) change. If we choose it too small, our estimates will be snappy but noisy. For example with T = 5e-3 s:

Estimates during sliding-window linear regression with a small time constant.

If we choose it too large, our estimates will be smooth but delayed. For example with T = 5.0 s:

Estimates during sliding-window linear regression with a large time constant.

In this instance, the filter may be acceptable for α(t)\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \alpha(t) but it fails to capture the time variations of β(t)\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \beta(t) (if we crank T\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} T up further we can even average them out completely). How to tune it from there depends on the desired behavior.

To go further

Sliding window linear regression combines statistics and signal processing. Its parameter T\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} T is the same time constant as in a low-pass filter, and it can be selected by specifying a frequency response, which does formally what we commented on informally in our example when selecting T\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} T to "capture" or not the time-varying dynamics of α(t)\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \alpha(t) and β(t)\def\bfA{\boldsymbol{A}} \def\bfB{\boldsymbol{B}} \def\bfC{\boldsymbol{C}} \def\bfD{\boldsymbol{D}} \def\bfE{\boldsymbol{E}} \def\bfF{\boldsymbol{F}} \def\bfG{\boldsymbol{G}} \def\bfH{\boldsymbol{H}} \def\bfI{\boldsymbol{I}} \def\bfJ{\boldsymbol{J}} \def\bfK{\boldsymbol{K}} \def\bfL{\boldsymbol{L}} \def\bfM{\boldsymbol{M}} \def\bfN{\boldsymbol{N}} \def\bfO{\boldsymbol{O}} \def\bfP{\boldsymbol{P}} \def\bfQ{\boldsymbol{Q}} \def\bfR{\boldsymbol{R}} \def\bfS{\boldsymbol{S}} \def\bfT{\boldsymbol{T}} \def\bfU{\boldsymbol{U}} \def\bfV{\boldsymbol{V}} \def\bfW{\boldsymbol{W}} \def\bfX{\boldsymbol{X}} \def\bfY{\boldsymbol{Y}} \def\bfZ{\boldsymbol{Z}} \def\bfalpha{\boldsymbol{\alpha}} \def\bfa{\boldsymbol{a}} \def\bfbeta{\boldsymbol{\beta}} \def\bfb{\boldsymbol{b}} \def\bfcd{\dot{\bfc}} \def\bfchi{\boldsymbol{\chi}} \def\bfc{\boldsymbol{c}} \def\bfd{\boldsymbol{d}} \def\bfe{\boldsymbol{e}} \def\bff{\boldsymbol{f}} \def\bfgamma{\boldsymbol{\gamma}} \def\bfg{\boldsymbol{g}} \def\bfh{\boldsymbol{h}} \def\bfi{\boldsymbol{i}} \def\bfj{\boldsymbol{j}} \def\bfk{\boldsymbol{k}} \def\bflambda{\boldsymbol{\lambda}} \def\bfl{\boldsymbol{l}} \def\bfm{\boldsymbol{m}} \def\bfn{\boldsymbol{n}} \def\bfomega{\boldsymbol{\omega}} \def\bfone{\boldsymbol{1}} \def\bfo{\boldsymbol{o}} \def\bfpdd{\ddot{\bfp}} \def\bfpd{\dot{\bfp}} \def\bfphi{\boldsymbol{\phi}} \def\bfp{\boldsymbol{p}} \def\bfq{\boldsymbol{q}} \def\bfr{\boldsymbol{r}} \def\bfsigma{\boldsymbol{\sigma}} \def\bfs{\boldsymbol{s}} \def\bftau{\boldsymbol{\tau}} \def\bftheta{\boldsymbol{\theta}} \def\bft{\boldsymbol{t}} \def\bfu{\boldsymbol{u}} \def\bfv{\boldsymbol{v}} \def\bfw{\boldsymbol{w}} \def\bfxi{\boldsymbol{\xi}} \def\bfx{\boldsymbol{x}} \def\bfy{\boldsymbol{y}} \def\bfzero{\boldsymbol{0}} \def\bfz{\boldsymbol{z}} \def\defeq{\stackrel{\mathrm{def}}{=}} \def\p{\boldsymbol{p}} \def\qdd{\ddot{\bfq}} \def\qd{\dot{\bfq}} \def\q{\boldsymbol{q}} \def\xd{\dot{x}} \def\yd{\dot{y}} \def\zd{\dot{z}} \beta(t). The overall algorithm itself is a simplification of the recursive least squares filter to our use case.

When deriving our system of linear inequalities for offline regression, the condition we used that the gradient vanishes at the global minimum of the objective function is a particular case of Karush–Kuhn–Tucker (KKT) conditions applied to our unconstrained problem. KKT conditions provide the general solution when the problem also has inequality or equality constraints.

Bonus: simple linear regression with JAX

For fun, we can look at the performance of the offline algorithm using JAX's just-in-time compiler. JAX also gives us a convenient way to try running the function on the CPU or GPU, so let's try both:

import jax
import jax.numpy as jnp

def simple_linear_regression_cramer_jax(x: jnp.array, y: jnp.array):
    dot_1_1 = x.shape[0]  # == jnp.dot(ones, ones)
    dot_1_x = x.sum()  # == jnp.dot(x, ones)
    dot_1_y = y.sum()  # == jnp.dot(y, ones)
    dot_x_x = jnp.dot(x, x)
    dot_x_y = jnp.dot(x, y)
    det = dot_1_1 * dot_x_x - dot_1_x ** 2  # not checking det == 0
    intercept = (dot_1_y * dot_x_x - dot_x_y * dot_1_x) / det
    slope = (dot_1_1 * dot_x_y - dot_1_x * dot_1_y) / det
    return intercept, slope

cpu = jax.devices("cpu")[0]
gpu = jax.devices("gpu")[0]
cramer_jax_cpu = jax.jit(simple_linear_regression_cramer_jax, device=cpu)
cramer_jax_gpu = jax.jit(simple_linear_regression_cramer_jax, device=gpu)

for size in sizes:
    x, y = generate_dataset(size)
    jx, jy = jnp.array(x), jnp.array(y)
    instructions = ["cramer_jax_cpu(jx, jy)", "cramer_jax_gpu(jx, jy)"]
    for instruction in instructions:
        get_ipython().magic(f"timeit {instruction}")

Here are the results of this benchmark on my machine (CPU: Intel® Core™ i7-6700K at 4.00 GHz, GPU: NVIDIA GeForce GTX 1070 with CUDA 10.1), combined with the ones from the previous benchmark above:

Benchmark of computation times for all three solutions. Here is the raw data: [(221e-6, 36.2e-6, 4.86e-6, 36.9e-6, 31.0e-6), (218e-6, 37.3e-6, 4.86e-6, 33.7e-6, 30.0e-6), (217e-6, 36.3e-6, 4.92e-6, 36.6e-6, 32.4e-6), (216e-6, 36.5e-6, 4.86e-6, 34.6e-6, 30.9e-6), (219e-6, 36.8e-6, 4.96e-6, 35.0e-6, 33.0e-6), (219e-6, 37.2e-6, 4.97e-6, 35.3e-6, 33.2e-6), (222e-6, 38.3e-6, 5.11e-6, 36.2e-6, 32.0e-6), (230e-6, 39.8e-6, 5.48e-6, 39.7e-6, 33.8e-6), (241e-6, 42.7e-6, 6.12e-6, 42.0e-6, 33.7e-6), (251e-6, 47.6e-6, 7.10e-6, 43.2e-6, 35.7e-6), (288e-6, 141e-6, 10.1e-6, 74.8e-6, 34.4e-6), (346e-6, 190e-6, 14.0e-6, 80.3e-6, 33.2e-6), (452e-6, 283e-6, 46.9e-6, 126e-6, 32.9e-6), (775e-6, 544e-6, 80.1e-6, 257e-6, 32e-6), (2.32e-3, 987e-6, 138e-6, 470e-6, 31.4e-6), (3.94e-3, 2.02e-3, 269e-6, 895e-6, 32.9e-6), (13.2e-3, 5.66e-3, 772e-6, 2.13e-3, 36.4e-6), (21.8e-3, 11.7e-3, 2.53e-3, 4.75e-3, 56.6e-6)].

Nothing beats simple NumPy on small datasets, and the turning point when transferring to GPU becomes attractive seems to be around 20,000 observations.

Discussion

Feel free to post a comment by e-mail using the form below. Your e-mail address will not be disclosed.

📝 You can use Markdown with $\LaTeX$ formulas in your comment.

By clicking the button below, you agree to the publication of your comment on this page.

Opens your e-mail client.