News Stay informed about the latest enterprise technology news and product updates.

The human factor in data classification

It’s human nature to want to categorize the world around us. But is this desire  to classify becoming a bigger part of our nature at the expense of creativity because of … IT?

Computers encourage us to classify and categorize, but following a corporate or coded script doesn’t exactly lend itself to creativity.

Our recent SearchCIO360 breakfast speaker, Professor Gary King, used customer service reps as an example of how data classification or categorization can stifle innovation. Most companies develop categories for customer complaints and the rep is typically asked to type information into a program with fixed complaint categories.

“There are huge efforts to convince [the reps] not to come up with a new idea because, if they do, you have to reclassify millions of previous data points. So what you are doing is taking the people who are good at innovating, the ones with the most insight into the customers, and telling them to please not come up with any new ideas,” King said.

We are allowing computers to make decisions for us, which can be a wonderful thing — if the data being used to make those decisions remains relevant. Many times, it doesn’t.

To err is non-human and human

Take King’s research into the solvency of U.S. Social Security benefits. If you think some of the data classification and management methods in your office are outdated, you might feel a little better about them in comparison to the Social Security Administration’s use of a data analysis method developed 75 years ago, according to King. This method, used to forecast mortality rates, was developed by demographers at a time when obesity was not a problem and smoking was not considered the death knell it is today. Still, this 75-year-old method is the one being used today to determine when Social Security may run out of money.

By King’s calculations, it turns out that date is around 2032. A bit of “a bummer,” as he said, but a more accurate estimate, based on a method that incorporates the deep qualitative knowledge that demographers have used over the last 350 years, updated with new mortality factors (such as the rise in obesity) with automation built into calculations based on his own algorithms.

This is an oversimplification – — King conducted his research not just for Social Security solvency, but also aggregated worldwide mortality data that analyzed 150,00 cross-sections. But his point was he did not seek out new data; he just analyzed existing data better and relied on a combination of qualitative (characteristics identified by humans) and quantitative (characteristics that can be measured) computer-aided methods.

“Fully human is inadequate, but fully automated or fully quantitative — meaning Excel spreadsheet with no labels — fails too. You need some qualitative information to decide what you’re trying to quantify. What really is needed is computer-assisted, human-controlled technology to take qualitative information, systematize it and then provide it back to the human being who actually ends up making the decisions,” he said.

Another case in point on the benefits of combining a human touch with statistically driven tech? One of King’s colleagues was running a complex data analysis that caused him to run out of room on his computer. IT told him that it would cost $2 million to give him a system to support his data analysis needs. Instead, a couple of King’s grad students spent two hours developing a new algorithm that does the job in 20 minutes — on the guy’s laptop!

The title of King’s talk at our breakfast was Big Data is not about the Data. His contention is that you can have all the data you want and the most powerful computer you need, but without the right analytics  — which includes that human, qualitative element — you could be looking at some costly, outdated data.

As one CIO explained, “There is a real sense of urgency to figure out how to build the right skill set to take advantage of qualitative data, qualitative people.”