Learn Machine, Learn
In this series, we take the mystery out of Machine Learning and explain some of the more commonly used techniques.
At Ixio Analytics, we often get asked what we do and the conversation usually goes something like this:
Q: So what do you guys do?’ A: We use data and analytics to help companies and organisations make better decisions.’ Q: Oh really?’ A: Absolutely.’ Q: How do you do that?’ A: Through a number of advanced computer applications including machine learning.’ Q: Machine learning? That sounds really sexy.’ A: Thank you. We’re sexy like that.’
Then just as we’re exchanging business cards out comes the question that’s been bothering our interlocutor all along, ‘But what exactly is machine learning?’
Machine Learning is sometimes confused with data mining. While the two are closely related and indeed overlap, ‘machine learning focuses on teaching computers how to use data to solve a problem, while data mining focuses on teaching computers to identify patterns that humans then use to solve a problem.’
Machine learning is found in a very wide array of applications including detecting credit card fraud, diagnosing disease, predicting election outcomes, estimating the magnitude of insurance claims before a disaster or identifying who is most likely (or not) to buy a particular product or service.
One of the foundational applications of machine learning is cluster analyses. This set of tools is used widely for segmentation and divides a population into clusters such that the members of each cluster have similar characteristics. In other words, they look alike.
Humans do it too
Human beings have instinctively clustered information since the beginning of time. We categorise a prowling lion as a threat and a smiling mother with a baby as benign. We organise information into homogenous clusters and slap labels on them that help us respond appropriately throughout the day. Homogeneity is the quality or state of being all of the same kind - i.e. a cluster is homogenous when its members look and behave in the same way.
In machine learning we effectively follow the same approach. A computer isn’t fazed at all by very large data sets with multiple dimensions. It simply gets to work and rearranges the cluster boundaries over and over again and stops when there is no appreciable increase in homogeneity within the clusters. And there are several measures of homogeneity that a friendly mathematician will gladly tell you about. So the next time you're wondering how your online retailer knew you were partial to sugared rose petal flavoured lip balm, they just may have let a machine loose on your data.