Reasonable linear models

Today I read in yet another (recent) machine learning paper:

It is reasonable to suppose that [super complex and interesting phenomenon] can be approximated by a linear model based on [whatever features we could get our hands on].

Wiktionary offers three meanings for the word reasonable:

  • Just; fair; agreeable to reason.
  • Not expensive; fairly priced.
  • Satisfactory.

Proving that the choice of a linear model is "satisfactory" or "agreeable to reason" would require some form of data analysis, or maybe a benchmark against competing models. But when we go for a linear model, it is often sadly the case that neither of these have been performed, and "reasonable" should be read as meaning number two: "for lack of something better".

What we can say about a linear model is that it is:

  • Simple, and often the most convenient to work with.
  • Readable: if the regressands have semantics, then so have the linear coefficients.
  • Sometimes, able to capture the dynamics of your problem.

But do these make it reasonable per se? If you have measures that validate your choice of this model, yes. Otherwise, keep in mind that not everything is approximable by a linear function, even in the real world. Convenience is a big driver behind the tools and scientific ideas we try, and it is unfortunately easy to believe we are guided by reason while rolling down a purely contingent path.

If you like your readings mind-teasing and refreshing, on this topic you can take a look at Poincaré's Science and Hypothesis (1902). You are welcome :-)


There are no comments yet. Feel free to leave a reply using the form below.

Post a comment

You can use Markdown with $\LaTeX$ formulas in your comment.

You agree to the publication of your comment on this page under the CC BY 4.0 license.

Your email address will not be published.

© Stéphane Caron — All content on this website is under the CC BY 4.0 license.