Machine Learning

How we solve problems we don’t fully understand

Three weeks ago I wrote in a margin note “‍Common ‘‍machine learning‍’ approaches are unlike other regression in that they do not produce a model or answer but instead an incomprehensibly-complicated function. Intuition learned in science classes about hypotheses, testing, truth, and understanding are the wrong tools for machine learning.‍” Today I wish to expand on some parts of this statement.

Regression is the process of picking a specific function out of a family of functions where the one we pick gets as close to some example data as possible. For example, linear regression tries to find the m and b that makes y = mx + b most closely approximate the available (x, y) pairs.

“‍Machine learning‍” is a term that is sometimes used for any regression (or its discrete counterpart, classification) where an algorithm learns the value of some parameters based on example data. It is also sometimes used for the aspiration of making a machine learn on its own the way animals do. Most often it is used to mean something in between: regression to some family of functions that is complicated enough that we do not need any a priori understanding to pick the right family of functions.

This notion of a priori understanding is important to regression. Consider for example trying to regress a function to the positions an orbiting body passes through. If your function family is lines or polynomials, we’ll never find parameters that work because orbits are (roughly) circles and circles cannot be expressed within those function families. If we have a high-enough order polynomial we can get close to the circle in the domain we have, but that approximation will fail very quickly outside that domain. Additionally, the coefficients of the polynomial have no interpretable meaning: they are just the coefficients needed to approximate a circle over the given range of x and will all change if we tweak that domain.

Machine learning, as commonly used, refers to regression to very complicated functions that, like high-order polynomials, have enough parameters that they can approximate almost anything but, unlike polynomials, work nicely with many-dimensional input and have less dramatic failure behavior outside their domain.

For example, one popular function family (called “‍artificial neural networks‍” or ANNsI am giving a simplified description of one common class of ANNs.) operates by multiplying each input vector by a large matrix (the elements of which are the parameters we fit, like we do the coefficients of a polynomial for polynomial regression), then applying a nonlinear function to each element of the resulting vector and multiplying by another large matrix, then applying a nonlinear function to each element of the resulting vector and multiplying by another large matrix, and so on for several stepsIf we use a lot of steps and pay attention to the results of the intermediate steps, we call it “‍deep learning‍”, not to be confused with deep learning.. Having millions of parameters and some nonlinear steps, this approach is very flexible. Being made of simple components, it tends to be “‍smooth‍”. And we know of algorithms that can pick the parameters to fit a set of data points.

ANNs and other popular machine learning approaches do not provide models of the world or underlying understanding. They do not form scientific hypotheses nor are they generally useful in testing them. They are simply approximations: hopefully useful approximations, but not themselves telling.

In machine leaning, we call the data points used to perform regression “‍training data‍”. In general, training data is collected via traditional data collection methods, meaning the data can suffer from many types of data quality problems. Any systemic errors, biases, or flaws in training data are replicated in the machine-learned functions. Additionally, the fancier the machine-learning function family the more training data is needed to get a good fit, and the more data you need the less affordable it is to collect high-quality data.

Thus, while machine learning lets us do things we don’t know how to do otherwise, it does so at a cost: both errors and biases are endemic to machine learning. Of course, errors and biases are also endemic to human reasoning, and particularly endemic to the “‍gut instinct‍” type of reasoning that machine learning hopes to one day emulate.The broader field of artificial intelligence includes efforts to emulate rational thought, not just gut instinct, but not through machine learning.