- Ekow Duker
Once is an accident...
Last week’s election in the United Kingdom threw up another surprise. Instead of strengthening their parliamentary position as was expected, Theresa May and her Conservative party dramatically weakened it. Attention has turned already to pollsters and statisticians, who after the Brexit vote and the 2016 US election, seem to have got it wrong once more. As the saying goes, “Once is an accident. Twice is a coincidence. Three times is just plain daft.” As statisticians ourselves, we think it’s more informative to side step the finger-pointing and look at possible reasons why pollsters are thought to have missed the mark.
Probability matters Part of the reason lies with the public themselves. The public has an intuitive understanding of point estimates but does less well with a probabilistic approach. In the last US election, one forecasting model gave Hillary Clinton an 85% probability of becoming president and only 15% for Donald Trump. In other words, if it were possible to hold the election 100 times, Donald Trump could be expected to win 15 of them. Donald Trump’s victory was unlikely, but it was by no means impossible. Similarly, a 15% chance of rain does not guarantee it will be sunny.
Sampling error There is also the small matter of sampling error. There are a number of methods pollsters use to construct a representative sample from a given population. These include random digit dialing whereby people are selected for a poll by generating telephone numbers at random. Because people move all the time, such sampling methods may not always be representative of people actually living in a particular location. Furthermore, where a sample is thought not to have representative coverage, a pollster can apply weights to certain demographics within the sample to compensate for this. As weighting methodologies differ, different pollsters looking at the same data, are likely to come up with different forecasts.
After last week’s UK election, attention has turned to so called online filter bubbles. These are tightly knit, online communities whose sole purpose is to promote their candidate. As some of these groups are closed, the vibrant conversations that happen within them are not easily detected and as such, are excluded from pollsters' forecasting models and samples. The Labour Party in the UK is said to have used such groups on Facebook to good effect.
With the speed and reach of online sharing, influential posts doing the rounds on social media in the run up to an election, could also influence opinion. Voluntary sharing of online content is thought to be a much better indicator of engagement with a topic than the number of views. These posts would not necessarily be considered by any forecasters, especially if they arise late in the campaign.
Measurement error When polled, people may lie for whatever reason. And when asked who they voted for in the last election, they may mis-report and say they voted for the eventual winner when in actual fact they didn’t. Or they may say one thing in a poll and change their minds afterwards depending on what is happening around them. One party may stumble in the closing days of an election while the other experiences a late surge in popularity as Jeremy Corbyn’s Labour party reportedly did last week.
Systematic error As was the case in the last US election, winning the popular vote is quite different from winning the presidency. Hillary Clinton won the popular vote but not enough electoral colleges, and in the US electoral system, it’s electoral colleges that count. The means by which county or constituency level forecasts are extrapolated into accurate presidential or parliamentary forecasts, is a function of the political system in place and this can give rise to systematic error.
According to a dataskeptic.com podcast on elections, “Polls are not right or wrong. They are simply measurements with varying degrees of sampling, measurement and systematic error”. No poll can ever eliminate these errors completely. The best polls however go a long way to reducing them.
Acknowledgement: Some of the ideas in this blog are taken from dataskeptic.com