- Megan Yates
The Data Science Cycle and How To Optimize It
A recurring theme from several conferences and discussions we’ve recently attended on data science is that businesses and business leaders do not always get value from their data science teams.
Value from data science tends to be measured in increased revenue, cost savings through optimization or happier customers. Across a typical data science cycle, there are key locations where businesses can set themselves apart and avoid the most common pitfalls that lead to them not realizing the full potential and value of their data.
1. Clearly Define Use Cases Aligned to Business Outcomes
Unfortunately data scientists, with their wealth of technical, statistical and machine learning expertise, don’t always build solutions with the business problem in mind. And sometimes, problems data scientists want to solve from an academic perspective, aren’t what the business needs them to solve.
Make sure that use cases are closely aligned to business goals by teaming up business partners and data scientists. Spend time on defining use cases simply and ensure buy-in from the relevant stakeholders.
2. Put Effort into Data Engineering
Not all machine learning methods take raw data as input. Data wrangling is the process of cleaning and unifying messy and complex data sets for easy access, analysis and predictions. The features in your input data directly influence the ensuing model and the results it achieves. Feature engineering therefore involves transforming raw data into metrics that relate to and represent the use case you are trying to solve. Typically, a process of data collation, wrangling and feature engineering is required before a use case can be addressed. These processes can be time consuming and tedious and often require domain knowledge to get them right.
Invest in engineers to speed up and automate these processes. Ensure that domain expertise is brought to bear through close collaboration with data scientists and data engineers.
3. Start with The Data You Have
Rather than wait months for the perfect dataset to be ready, test on samples and start with the data you currently have. The data will never be perfect, and there will always be more variables to add in and test. Many machine learning methods are robust to missing data, so data scientists can’t use incomplete, messy data as an excuse.
An abundance of insight can be learned with the data you already have. Modeling outcomes can provide direction on data enhancement measures in the future.
4. Build lots, simply and quickly
Tweaking models to get incremental improvement in accuracy is hugely rewarding for almost all data scientists. But the additional business value from a 0.01% advance in accuracy achieved over many, many hours of toil likely won’t change business outcomes.
Build simple models quickly and test them in a business environment as soon as possible. The testing phase will provide a great deal of insight into a model’s usability and efficacy, so aim to get to this stage in the shortest possible time.
5. Trust your Data Scientists
Success in realizing the value of your data through data science is heavily dependent on business partners and stakeholders trusting the science behind the methods and taking the often uncomfortable step of testing in the real world - i.e. with real customers.
Testing can be done on samples of customers so the risk isn’t too great. Business leaders willing to experiment, set their organization up to succeed in deriving value from data.