This content is part of the Essential Guide: Big data tutorial: Everything you need to know
News Stay informed about the latest enterprise technology news and product updates.

Data silos in big data analytics: Now you see them, now you don't?

Big data makes the problem of data silos even bigger. Or does it? Either way, CIOs must make big changes to get the most out of big data analytics.

Data silos have plagued business intelligence efforts for as long as businesses -- and their CIOs -- have been trying to extract meaningful BI from data. The existence of data silos means that the slaved-over, costly, single-version-of-the-truth database being tapped for great insights is really only a partial version of reality. And therefore the answers that database yields quite possibly are not the right answers after all. Garbage in, garbage out, as the BI pros say.

We want to let people look at the data, but we have to make damn sure they can't change it.

Enter big data analytics -- the buzzword for the data triple-V (variety, volume and velocity) that inundates most companies today. And the data silo plague grows --exponentially so, according to analyst Ted Friedman.

"You've had silos for all time inside your company. Now, with the big data phenomenon, you have silos that reside across the universe -- inside your firewall, out on the Web, in the cloud, data that could be owned by other enterprises or by your customers and suppliers," said Friedman, who covers information management at Gartner Inc. "All those things create a higher level of challenge in breaking down the silos and getting to some meaningful analytics across all that."

So, what is the CIO's role in making meaning out of big data? Like so many IT-related issues in the enterprise, that problem and its possible solutions involve people, process and technology. CIOs not only will probably need to add skills to their own staffs (for example, by hiring data scientists, mathematicians and information architects), but also will have to convince the business that big data governance is a concern for the executive suite and even the boardroom.

Data management is suddenly sexy

One approach to dealing with the problem of data silos in big data analytics is to limit the focus, through a process Gartner is calling information valuation. "Not all that data out there in the big ocean of data has the same degree of value. The challenge is in carving down the whole problem space to what is meaningful," Friedman said. "I see clients setting the scope much too broadly."

Companies can narrow their focus by asking question like these: What can we truly get out of this data? Where does it connect with our business? How can we generate positive return using it?

As businesses become more attuned to the potential business value lurking in big data, Gartner is seeing more companies forming data governance boards. Made up of business-side stakeholders, these boards are tackling everything -- from which data sources are important and where to invest resources to questions about data quality, retention, integration, security and privacy.

Dangerous explorations outside data silos

Deriving value from large data sets requires opening them up for exploration by lots of people, not just a handful of IT experts. Gartner and others worry, however, that in their rush to harness big data, many organizations could lose sight of governance and pay the price in privacy breaches, data fraud and other problems associated with gaining access to lots of data.

"In the enterprise, it is not a practical approach," said Boris Evelson, principal analyst at Forrester Research Inc. in Cambridge Mass. "There are all sorts of regulatory issues, conflicts of interests. The 'Chinese wall' between an investment bank analyst and trader comes to mind."

Protecting the integrity of data is a huge concern at the National Snow and Ice Data Center at the University of Colorado in Boulder and for NASA, NSIDC's data collecting partner, said David Gallaher, the center's IT services manager. His job is to help manage, process, distribute, and provide universal but controlled access to petabytes of scientific data on the world's frozen realms. "We want to let people look at the data, but we have to make damn sure they can't change it," said Gallaher, a geologist by training. The organizations' scientists, on the other hand, modify the form the data takes every time an algorithm is tweaked. So, governance must be put in place to make sure the "right people are doing the changing," he said. The NSIDC currently partners with the National Science Foundation's efforts to address data governance.

Multiple views of data, not multiple copies

Not everyone agrees that big data means more data silos. Anjul Bhambhri, vice president for big data projects at IBM, makes the argument that big data "is really helping" CIOs.

"Now they are able to remove themselves as a bottleneck," Bhambhri said in an interview about breaking down data silos at the 200-some companies she has worked with over the past year. One large enterprise company had 13 data marts for its email archive (eight just from legal alone) because the departments that needed access to the archives couldn't wait for IT to get them answers. At another company that analyzes cookies for a website, two departments made their own copies of the data. "This is 15 billion cookies a day," she said.

New technologies -- including, of course, IBM's big data products -- let companies store and analyze huge amounts of data in a data repository. So, instead of 13 email data marts of stale data or duplicate copies of 15 billion cookies, for example, a company has an active archive that is can be queried by multiple groups. "You have data in one place and have multiple applications running on it at the same time, because no data is getting transformed at the storage level," Bhambhri said. Nevertheless, even she and other evangelists like her concede that reaping the benefits of big data requires a lot of changes for IT. "Just being able to store the data is a major step in the right direction. But if it can just be stored and not analyzed, it is not good," she said. "It takes a lot of algorithms."

Let us know what you think about the story; email Linda Tucci, Senior News Writer.

Dig Deeper on Enterprise business intelligence software and big data