Is your (big) data good enough?

Junk in, junk out.

As companies begin experimenting with the possibilities of big data, this snappy expression has become the universal warning against algorithmic blunders. At its heart is a simple but crucial truth: an algorithm is only ever as good as the training data on which it is built. Or put otherwise: to do big data, you need good data.

Automated decision making (or AI, if you prefer the fancy term) is an amazing tool. It allows marketers to micro target like never before, banks to make loan decisions in microseconds, and is behind the seemingly miraculous technology of self driving cars. But all of these are underpinned by one crucial building block: if the training data isn’t good enough, then neither is the outcome. Feed a self driving car the wrong images of zebra crossings, for example and it runs the risk of charging straight through and endangering pedestrians. Feed wrong data to your automated marketing campaigns, and not only do you fail to make the most of this groundbreaking technology, you run the risk of campaigns being actively counterproductive and antagonising existing and prospective customers.

At Abensour & Partners, we have a rule of thumb: our clients usually underestimate how much data they hold, but overestimate its quality. If GDPR, the EU regulation on data privacy, has had one positive impact, it’s to really force companies to ‘dig under the bonnet, and see how many holes lie in their data. Lots of companies are missing the names or gender of their customers, and some brands with older customers don’t know from their datasets whether they are still alive!

This matters because poor quality data is costly, both in monetary and reputational terms. Data scientists, despite their eye-watering salaries, spend a huge proportion of their time cleaning data – a task which is time consuming but doesn’t require much skill: you and I could do it after a day’s training (or if we ended up in one of China’s many ‘data factories’.). If your data needs cleaning, you can bet data scientists will charge you a bomb to clean it.

And the cost isn’t just paying your data scientists for unnecessary extra work. Poor data can also lead to blunders with huge reputational costs. In Idaho, for example, some residents with disabilities had their entitlements slashed by the Department of Health’s new “budget tool”, whereas others had theirs increased. It took 4 years and a class-action lawsuit to get to the root of the problem, but it turns out the solution was simple: the algorithm was junk, the data powering it being riddled with errors. Or think of Google’s Photos app, that mistakenly labelled a black couple as “Gorillas”, leading to groveling apologies and huge reputational cost in an industry already struggling with diversity.

So should we be using big data in our communication strategies? Absolutely. But remember: junk in, junk out. So first, carefully evaluate two questions: what data do I really hold? And crucially, is that data good enough?

Tom Hunter