- Steven Sidley
Chris Andersen, erstwhile editor-in-chief of Wired Magazine and curator of TED, recently made the comment that algorithms have upended science. His reasoning was straightforward - science has always progressed from hypothesis to proof. Or, to put it somewhat differently, a model is proposed and then tested against its intent. If it doesn’t work, then a new model is sought.
But disciplines like data science often take an entirely different route to discovery. A beautiful example of this was Walmart’s data algorithms, which uncovered the startling fact that consumers stock up on Pop Tarts before a storm. No amount of modelling could have predicted that - the statistical foraging of data science excavated this fact.
This is not to say that modelling or the hypothesis development and proof have had their days. On the contrary, the more that scientists know about their worlds, the more they are able to take leaps of theoretical logic, and wait for experimental evidence to catch up. This is the story of general relativity - there was not a whit of experimental evidence when it was published by Albert Einstein in 1915, and now, over 100 years later, its predictions continue to gain the support of evidence (the latest being gravitational waves).
But data science and machine learning often dance to a different tune. Statistics, which underpins much of data science, gives us a set of lenses through which we can look for patterns, without knowing a priori what those patterns might be. Take a tsunami of data, filter it through any number of statistical transformations or neural nets, and see what appears on the other side. Without the scientist’s intent, (or at least without even fuzzy intent) a judicious use of these filters might produce startling results, like statistically significant correlations between two hitherto unrelated data structures.
Once a correlation is unearthed, traditional science can be re-introduced, models and hypotheses can be constructed, causation can be captured and locked down. Or we can simply iterate algorithms, dream up new statistical or learning filters and watch the change of colour of these correlations against their new backgrounds, making their conclusions easier and easier to draw, even if actual causation stays out of reach.
So data science is often an iterative exploration - we don’t always know what we will find. Traditional science is the opposite - we think we know exactly what we are looking for, and set out looking for a way to prove it. Put the two together, and the chances of unique and useful discoveries multiply.
At Ixio Analytics we chase these twin rainbows. Out data scientists are steeped in traditional scientific methods. We propose, we forage, we model, we filter. And at the end, we discover.