- Steven Sidley
The Perfect Data Scientist - A Mythical Beast?

John Tukey was a Princeton statistician who in the 1960s, first imagined the job of a data scientist. It is safe to say that this new job category did not catch a popular wave until decades later, when Hal Varian, Chief Economist at Google, conflated statisticians and eroticism in 2009 by proclaiming the career to be the ‘sexiest job in the world”, thereby exciting the prurient interests of inflamed techies all over the globe.
In case you are wondering about the linkage between statistics and data science, here is the most pithy definition of a data scientist I have read. "A data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician.” Not very precise, but there is more than a kernel of truth in there.
Still, a little rigour is required. The problem with pure statisticians, is that most are not great coders, and fewer still understand the boundaries of software engineering. Their code is often a convoluted mess of spaghetti, and while it works for them, there is usually no need to share it with a team, or the world at large. When presented with a problem, the statistician will find an elegant predictive or modelling solution, even if it is buried inside a best efforts scrabble of computer code.
One the other hand the great coders of the world, for whom Githubs and commits and opensource protocols are baked into their DNA, rarely have deep statistical skills. They are sometimes unaware of assumptions, axioms, alternative predictive approaches and the many other tentacles of the statistical arts. So they grab SAS libraries or open source modules that implement some sort of predictive solution and are adept at jamming parameters through an API door.
This is not to say, never the twain shall never meet. The global sprouting of data science education is trying hard to merge these two disciplines. Given the rewards on offer, the gap will surely close.
But there is a more pressing problem, and it sits within the enterprise. The subject of data science/data analysis/big data is all abuzz right now. And because of the the word ‘data’, guess who does the hiring? It is usually a job description that ping-pongs between IT and Human Resources. And because IT managers (particularly legacy IT managers), have little understanding of the statistical pillar of data science, they tend to hire who they have always hired, that is programmers who might have done a course or two in data analytics, or worse, a lateral movement of an internal enterprise programmer into data science, with an imprecation to ‘read up on the subject’.
We have recently seen this in various enterprises we have visited. An entire team assembled as the ‘Data Team’ or ‘Data Hub.’ And then lots of busy work and no solutions presented or solved two years after a big internal launch.
There is a final pillar that is often not addressed by the job description. Without an understanding of operations and business process no one (not even the mythical statistician/software engineer cyborg) is going to make much headway.
There are many guides out there for hiring data scientists. Here is one that we like: https://www.linkedin.com/pulse/what-do-hiring-managers-look-data-scientists-cv-ben-dias/
It’s written by Ben Dias, the Head of Advanced Analytics and Data Science at the Royal Mail in the UK. We think it covers most if not all, of the bases.