- 29th May, 2018

## Analytics and machine learning - a value framework

By Steven Sidley

In our engagements with various clients and prospects, we are often confronted with confusion about the difference between ‘data analytics’ and ‘machine learning’ and ‘AI’. This confusion is in good company. The lines between them are sometimes a little fuzzy, even to battle scarred veterans.

A quick look through the literature does well to further this confusion. But Gartner has come up with what I believe to be the most compelling and useful framework. They divide the world of data analytics into three discrete families - Descriptive Analytics, Predictive Analytics and Prescriptive Analytics (more on AI later).

**Descriptive Analytics** occupies the low-complexity, low-value spot on the graph. This is not to say that it is low utility. On the contrary, descriptive analytics is concerned with ways of representing complex data in easily digestible formats, where trends or correlations are fairly easy to unearth without complex statistical analysis. This is particularly true of sophisticated visualisation techniques used in descriptive analytics, where numbers and tables can be better ‘seen’ in graphs and charts and bubbles and histograms and pies and the like. Basic stats are sometimes also applied, but usually no further than techniques such as simple linear regressions to extend trendlines or other obvious patterns. Advanced visualisation is a critical tool, but has gradually given way to more mathematically adventurous wrangling of the data.

**Predictive Analytics** puts numbers under the more rigorous microscope of statistical predictive modelling and analysis. Whether this be a horizontal expansion beyond the 2D matrix mathematics of Excel-style tables and into multiple dimensions, or the discovery of new clusters of seemingly unconnected data across new variables or axes, or more sophisticated curve fitting than straight regression, this discipline seeks to predict the movement of data over time. The underlying mathematics requires advanced software to actualise, for which companies like SAS have been well known for decades. However, Predictive Analysis remains an evolving field, particularly as new datasets and new means of data collection continue to proliferate.

Which brings us to **Prescriptive Analytics**, which is, as the name implies, the branch of Data Science that directs itself towards a specific goal or outcome. Rather than trying to understand how to present the data in more digestible form, or to probe at its future state, it seeks to reach a stated outcome, like ‘save energy costs’, or ‘win at Chess’ or ‘translate perfectly from French to Turkish’. And it is machine learning which currently stands as the most important arrow in the prescriptive quiver.

In general, machine learning is valuable when you know what you want but you don’t know the most important input variables to get there, or more pointedly, when you don’t know how the variables interact with each other. So machine learning algorithms take as input a set of observations and variables, and then “learn” from the data which mix of variables need to be massaged in order to achieve the desired outcome.

There are a number of underlying algorithms that attempt to get from data-to-outcome in fundamentally different ways, but what is common amongst them is the input of training data (sometimes human-provided, sometimes collected, and sometimes generated by software), followed by a mathematically rigorous ‘connecting of dots’ leading to a recommendation as to optimum strategies to reach the desired outcome. This is, of course a simplification. The skill in identifying variables and training data and ‘dot-connecting’ algorithms is part of the machine learning and modelling discipline.

By way of example. Google wanted to save energy at their multiple global data centres. There were 120 inputs from their cooling systems - fans, pumps, windows, etc. These 120 variables were modelled and tested, and the machine learning algorithm was able to sculpt the optimum mix of variables to save the most amount of energy. This was an example of a clear set of inputs, clear datasets, clear models, a clear desired outcome and a resultant optimum strategy that saved Google hundreds of millions of dollars in energy costs.

Machine learning has gotten smarter and smarter. Its ability to find optimal strategies in unexpected places continues to surprise (and amaze), and has exceeded anything that could have been produced a mere ten years ago. The underlying techniques (supervised vs unsupervised, deep learning, evolutionary learning, Bayesian modelling, etc) have spawned entirely new fields of research, some of them only a few years old.

Finally, **Artificial Intelligence**. I studied AI in the dark ages, where my specialty was computer-composed jazz solos. In my day, AI was defined as the study of how to build algorithms to do things that we normally think of as uniquely human creative activities. Like composing jazz. It seems as though this definition continues to move. Chess and the Chinese game of Go were once restricted to the creative domain of humans. Not any more. AI is a vast topic with a broad swath of tools that are being brought to bear on difficult problems with some astonishing advances.

But that’s a topic for another day.