Once Upon A Time
It being that time of year, I thought I would cast my gaze backwards a few decades, in order to provide some context as to how we have arrived at the current state of the art.
Once upon a time, in the 70s and 80s there was something called MIS. It stood for Management information Systems and was a catch-all phrase that IT chiefs used to describe the set of outputs that enabled managers to make smarter business decisions. The outputs were simple - sales charts, revenue trends and the like.
This was replaced, sometime in the 90s, by a field called BI. It stood for Business Intelligence, and guess what? It was also a catch-all phrase that IT chiefs used to describe the set of outputs that enabled managers to make smarter business decisions. However, BI was a bit better than MIS in that it could be abstracted out of hardware and storage constraints and housed in its own technology and governance siloes, usually guarded by a sentinel called a database administrator.
There was little difference between MIS and BI although BI reports may have looked prettier. Business software had started expanding beyond financial reporting into other areas such as HR, supply chain, manufacturing and document management. But BI was still producing nice-looking pictures that essentially represented the simplest sort of insight. It provided tools that allowed one to see the surface level of data in more accessible ways like trend lines, bubble charts, histograms, simple regressions and the like.
Then in the first decade of the new millennium we had Data Analytics where software providers like SAS and SAP and MS and Oracle started applying more statistical rigour to toolsets. Visualisation tools proliferated, analysis became deeper and multivariate. Things called data warehouses and data mining sprang into the technology vernacular. Data base schema started to break out of its relational straight jacket, typified by SQL and SQL-like approaches to data mapping, query and storage. The tools jumped in complexity and some serious expertise was required to learn them. Insights grew deeper, and the concept of data as raw material with embedded sentience took hold.
How then did we make the leap from Data Analytics to Data Science?
In trying to make sense of how we got here, we have to consider the speed at which the confluence of technological and social and economic forces conspired to give data a starring role in our present and future.
1) There has been the explosion of data (particularly real-time data) enabled by the global Internet, smartphones and the lowering cost of data sensors. Much of this data sits outside of traditional database schema. Much of it is unstructured or language based and is implied by clicks or eyeballs and other new metrics.
2) There was a collapse in the price of storage and associated compute power enabled by numerous technological and architectural advances at the beginning of this millennium such as Hadoop and Amazon Web Services. This evolved into a cornucopia of architectures and frameworks and technologies, all freely or cheaply available to researchers and entrepreneurs well outside of Big Enterprise.
3) Cheap storage and compute power and newer, smarter frameworks and ecosystems like Spark and Pig have unleashed a tsunami of innovation, attracting cross disciplinary practitioners from outside the computer science ivory tower. Statisticians, mathematicians, ecologists, meteorologists, astronomers, epidemiologists, physicists, even social scientists and political pollsters now have access to advanced data tools with endless depths of insight to plumb.
4) Data accessibility is suddenly a thing. Organisations which once jealously guarded their data are suddenly happy to allow the wisdom of the crowd to provide them with insight. Enterprises are also now outsourcing solution design to competition platforms such as Kaggle. The online site Quandl has over 1 million datasets covering every imaginable economic, social and financial indicator, all current and free.
5) The entire field of machine learning and AI burst out of its long lonely hibernation in research labs and universities and into the public sphere. Models, approaches, software, academic papers and very public successes from the likes of Google and IBM have led to solid and robust advances in the underlying mathematics and a proliferation of courses from almost every major university in the world.
Throw all of these factors into a pot, and take a look at some of the stuff now popping up, like blockchain and it is hard not to conclude that we are entering a golden era where infrastructure and compute and storage will play second fiddle to a much more important question that we have long asked and not always been able to answer, which is, ‘Yes, but what does it all mean?’