The Machine That Made Mistakes
Some of you might have watched or read about last month’s big money fight in Las Vegas between Connor McGregor and Floyd Mayweather. The fight was stopped in the tenth round to save Connor McGregor from further punishment. In boxing parlance, the Irishman’s legs had ‘gone’ and it looked like only a matter of time before he went down. Nonetheless, Connor McGregor protested that the referee had stopped the fight too early and had this to say to a news reporter immediately after the fight:
‘Wobbly and fatigue…That’s energy, that’s not damage. I’m clear headed!’
In other words, Connor McGregor maintained that the referee had seen one thing and interpreted it as another thing entirely. This is a common phenomenon in human psychology and statistics.
Bias, bias, everywhere. Humans suffer from a range of cognitive biases that futurist, Peter Diamandis, lists in a recent blog. These include the herd effect or the tendency to gallop in a particular direction because others are doing the same; the overconfidence effect or the inclination to think more highly of one’s ability than is warranted; and my personal favorite, the illusion of control, where a sports fan will wear his team’s jersey to the stadium, or worse still, to the pub, in the hope of influencing the outcome of the game.
Machines you would think, are immune to such flaws. A machine after all, has none of the irrational urges or emotional volatility that plagues us humans. Unfortunately, machines do get it wrong and can be as fallible as the Arsenal fan crying into his beer at halftime.
In assessing how accurate a machine learning model is, a data scientist will construct a quick tally of what the machine predicted against what was actually observed. Two types of error arise as shown in the illustrations above.
A Type I error or a false positive, occurs when you mistakenly reject a true hypothesis - in this case, that the man in the picture is not pregnant. A Type II error or a false negative, occurs when you wrongly accept an incorrect hypothesis - in this case that the woman in the picture is not pregnant.
All models are wrong. As statistician George Box famously wrote, ‘All models are wrong but some are useful’. So how wrong does a machine learning model have to be before it is no longer illuminating or useful? The answer is, it depends. It depends on what the machine has been asked to do.
Consider a neural network that fails to predict the onset of cancer from a data set of cell characteristics. This Type II error could lead to emotional trauma, crippling medical bills or untimely death. In this case, the cost of the machine getting its sums wrong is prohibitively high. On the other hand, a model may incorrectly predict that a credit applicant will not repay her loan when in actual fact she would. From the perspective of a risk averse lender, this is a much better outcome than wrongly predicting that the borrower will repay her loan when in reality she would not.
So context is important in assessing the cost of the errors a machine will inevitably make. As we go about our daily lives, most of us are oblivious to our own biases and the biases of others. Connor McGregor may be a statistician after all.