Nmedia - Fotolia
At Harvard University's Societal Impact through Computing Research event, one of the participants shared a story about meeting an insurance company's IT leader who had moved to Hartford, Conn., six months prior. The IT leader was "completely dedicated to his navigational system," the participant said, and only understood the city through the GPS device. Without it, he didn't know which direction was north.
Big data -- like the stuff that's enabling GPS devices to provide real-time directions -- is changing how we understand the world. When a GPS device tells us to turn right, we turn right. And because we're so willing to integrate new technologies into our lives, we run the risk of understanding the world -- in ways that are sometimes factually and ethically wrong.But I understood the guy's point:
That was one of the points made during a roundtable discussion at the event. (The 10 academics and/or industry professionals asked for anonymity to ensure an open and frank dialogue.) The exchange got me thinking about big data ethics and the role of the CIO. Leaders of IT are often told to consider the business problem rather than the technology solution as they embark on big data projects. But once the business aims are identified and achieved, what effect will the big data project have on society?
The advantages of having more rather than less data are well-documented, but what happens when a society becomes so reliant on data that it trusts the data implicitly and acts on the data without thinking or considering the consequences?
I run across this in my reporting on big data analytics. Bad analytics led researchers to insist, for example, that unemployment was on the decline because their social media software analytics program mistook all the mentions of "Jobs" following Steve Jobs' death for "jobs" -- as in work.
That misinterpretation was a failed experiment, but examples abound of big data analytics intruding on private lives. The Target teenage pregnancy incident is one such example, and so is a recent story I heard at the Gartner Catalyst conference of a large financial institution looking to predict customer churn. The bank uncovered a pattern of customers who were preparing to leave but failed to realize that many of the customers they were about to petition with offers to stay were spouses quietly getting their finances in order before filing for divorce.
In each of these cases, the error was a contextual one. (Can we call that a data quality issue?) As one participant, a computer science PhD student, said, "I spent time as a data scientist and my colleagues were smart, but they came up with solutions that didn't make sense." Some big data projects work in a vacuum but don't hold up in the real world.
When data leads to discrimination
When it comes to big data ethics, contextual errors are the tip of the iceberg.The bigger concern is what such blind faith in the data will lead to -- as opposed to scrubbing the analysis with scrutiny or critical thinking. What will the business do, for example, with correct, but potentially unethical, correlations? What happens when big data helps businesses perpetuate stereotypes or discriminatory policies rather than to dispel them?
That may sound farfetched, but examples are already materializing of algorithms that reinforce discrimination. Researchers from Carnegie Mellon University recently published a paper on how Google's online advertising system targeted men more often than women with ads for jobs with high salaries. Why? Google provided this response in a statement to news outlets like The New York Times and The Washington Post: "Advertisers can choose to target the audience they want to reach, and we have policies that guide the type of interest-based ads that are allowed."
Data and analytics have become a hotbed of innovation. And that can mean, as one participant pointed out, it's often easier for businesses to seek forgiveness later than ask permission up front. "Even if you wanted to ask permission, the legal system doesn't have a process for permissioning. The system has not caught up with the technology," one participant, a lawyer from Harvard's Berkman Center for Internet and Society, said.
And even if such a process existed, data scientists aren't lawyers and vice versa. Perhaps, as the PhD student said, it's time to figure out where computation ends and the discussion of the ethical and legal ramifications of big data begins. "Computation can inform the discussion that will inevitably have to be decided on in the legal arena or the political arena," he said.
Big data ethics is just academics
In case you think big data ethics discussions are purely academic, rest assured they are not. The topic also came up at the recent Strata + Hadoop World conference in New York City.
DJ Patil, chief data scientist at the United States Office of Science and Technology Policy, stood on stage and issued a call to arms. "My ask is that every training course, every curriculum, every MOOC, every college class, every professional degree, every program at a company has a data ethics curriculum that is intrinsic -- not some bolt on, but intrinsic -- to the training of every data scientist, every computer scientist, every data engineer, every data operations person," he said. "We must lead by defining what this program needs to look like."
Later, Patil led a well-attended two-part session on data ethics that had attendees from the banking, healthcare and retail industries, to name a few, echoing what I heard at the Harvard event.
When one attendee asked Patil how to help, he said: "This is what help looks like: Us getting together, taking ownership of the issues and starting to define it as a community. Here's what help doesn't look like: It's a bunch of people who don't work in this space and write a paper and say, 'Here's your new ethics standards. Tough luck guys.'"
In 2013, Babson Professor Tom Davenport made the bold prediction that the next big data trend would be lawyers. Read his thoughts on why. Then, listen to a podcast about applying ethical codes to big data stewardship and check out another, which discusses the importance of not ignoring the debate over privacy and big data.