The days of scrubbing data until it's squeaky clean are quickly becoming a luxury, especially as IT departments answer the business' call to arms for more speed and more agility. But providing real-time data use raises a fundamental question for CIOs: Just how clean is clean enough? Experts like Farzad Mostashari, former national coordinator of health information technology for the U.S. Department of Health and Human Services, have...
persuasively argued that the solution to dirty data is more dirty data. Adding data "provides you with context," he said at an information quality conference last summer. Others, like Michael Berry, analytics director for TripAdvisor's business operations, think otherwise. Those who believe they don't need to worry about clean data because they have so much data "are just wrong," he said at a predictive analytics event last fall.
Greg Pfluger, vice president of information systems at American Family Insurance, had another take. In this SearchCIO Ask the Expert, Pfluger addresses an audience of CIOs and IT leaders for the Fusion 2014 CEO-CIO Symposium in Madison, Wis., and answers the question: More data or clean data?
Greg Pfluger: That is probably one of the key questions the IT community will struggle with over the next five years. We have all of these emerging, external data sources that we need to integrate for various business reasons -- but it's not always obvious when something is just another piece of data and when it needs to be high quality. I don't think we're going to emerge from this discussion with hard-defined standards, where we agree on an industry standard of two or three categories and we all align to them. The categories are going to differ over time from business to business and industry to industry. I would encourage IT leaders to think about how they need to categorize these things in their particular analytics environment.
For example, a CIO might define three different buckets: One, we really don't care about the cleanliness of the data; we're just trying to make our marketing efforts better. If our marketing goes from 2% to 3%, we're brilliant. We're doing that with a lot of suspect data, but that's OK.
Contrast that with marketing data that's used to target an existing customer. This might be the second bucket, where the quality has to be a lot better. I'm sure we're all occasionally irritated by our cable providers based on service and pricing. I get annoyed by mine at least once a month because somewhere along the line, their old customer database and their current customer database don't match. I canceled my service when I moved from Stevens Point, Wis. and I already had new service in Madison from the same provider. But now they keep trying to market to me to come back -- even though I'm an existing customer. It's the same email account I use to get billing notices. So it seems I can't unsubscribe because then I'll stop getting my bills. They have data integrated on some level, but it's at the wrong level, because they don't recognize me as a current customer.
The third bucket might be when you're trying to enable a transaction and you'll need very high quality data. I heard from a colleague recently who experienced flight delays due to winter storms. At the end of it, he became an even more loyal customer to Delta because of the way they used information to update him and automatically rebook his flights. He's even willing to pay extra to fly Delta now -- despite canceled flights!
The next generation of service for airlines, I'm sure, is going to be more proactive. They will know a storm is coming in and offer customers the chance to fly out one day earlier at no charge because it's going to get you where you want to go and help reduce the chaos in their system. You'll probably like getting an email that offers you an earlier flight out and maybe you'll even pay extra for it. But what happens if the storm data is wrong or your address is wrong or the airport you're flying out of is wrong and you get this email? You'll say, 'Wait a minute, there's no storm forecasted,' and the airline will lose credibility -- the require a high degree of accuracy.
IT departments need to think about what categorizations are right for their organizations and put the right amount of data governance practices and data quality processes in place for each category.
As reported by Nicole Laskowski, senior news writer
Dig Deeper on Enterprise business intelligence software
Greg Pfluger asks:
Where do you stand on the more data/clean data debate?
0 ResponsesJoin the Discussion
Have a question for an expert?
Please add a title for your question
Get answers from a TechTarget expert on whatever's puzzling you.