When I Grow Up
Training the next generation of data scientists In 2012, Harvard Business Review called data science the sexiest job of the 21st century. As more companies realize the value of harnessing their data, the demand for skilled analysts who can turn raw data into valuable insights grows. By 2018, it is anticipated that data science jobs in the U.S. will exceed 490,000, with fewer than 200,000 data scientists available to fill these positions.
The rapid growth of the data economy has outpaced the supply of data-savvy analysts, leading to the current skills shortage. A major contributing factor to this shortage has been the lack of dedicated training programs for data scientists until very recently. Most practicing data scientists did not train for the specific tasks that they must perform on a daily basis. Rather, they have adapted their existing skills, often as computer programmers, statisticians, or scientists. Those that have succeeded in making this transition generally share certain characteristics that have allowed them to rapidly master the new skills and technologies data scientist typically rely on.
Education is now catching up with industry demand, and there are now a wide range of options for training specifically as a data scientist, both at undergraduate and postgraduate level at established universities and colleges, and through online learning platforms, such as Coursera. As these courses proliferate, we should contemplate what skills young data scientists need to be equipped with to prepare them for a future in this rapidly evolving profession.
Obvious answers to this question include computer programming, and statistics. But even these obvious answers require pause for thought. The programming languages used in different industries often vary, and the popularity of these languages changes over time. Perhaps the most important skills a young data scientist can be equipped with are the ability to rapidly adapt to new languages, and the ability to translate abstractions into new implementations. Every year the technology and methods used for the most advanced predictive analytics evolve, and any practitioner who cannot keep up with this evolution will quickly fossilize.
For data-driven insights to be truly effective, data scientists need to have a substantive understanding of the nature of the industry to which their insights are relevant. As the range of potential industries in which a data scientist may work is extremely broad, it is not possible to teach this as part of a foundational training programme. This knowledge must be acquired through experience and will require much self-instruction.
The foundational skills of statistical understanding, computer programming, and knowledge of (current) cutting-edge technologies are prerequisites for the modern data scientist. No less important is the ability to:
/effectively communicate ideas to audiences of varying technical understanding
/rapidly learn new technologies through self-instruction
/develop deep understanding of new businesses
/adapt core skills to provide industry-specific insights
Image: A boy clutching a chalkboard in an overheated and unfurnished class- room Credit: The Ernest Cole Family Trust/Hasselblad Foundation Collection