This content is part of the Essential Guide: From data gathering to competitive strategy: The evolution of big data
Manage Learn to apply best practices and optimize your operations.

Consider these five questions before you tackle large-scale data

Large-scale data processing isn't easy, but it can give your organization a leg up on the competition. Here are five questions to get you started.

Uncovering the secrets lurking in large-scale data sets is companies' next best hope for beating out their competition -- or so the recent torrent of Big Data reports proclaim. But first, IT needs to crack the code. If your IT organization is puzzling over what Big Data actually means and how it should be using large-scale data sets, it is not alone.

In May, analyst Boris Evelson and his colleagues at Forrester Research Inc. queried 4,000 clients about their organizations' understanding and use of Big Data. When he went through the responses, however, he noticed something unusual: As the questions became more challenging, respondents dropped off in droves, with only 40 people completing the full survey. He had never seen so many abandoned survey questions. "Being experienced analysts, we thought this was going to be another piece of research where we survey the market, talk to people and come up with some best practices," said Evelson, a business intelligence (BI) expert at the Cambridge, Mass.-based firm. "Within a few days, I came to the conclusion that at this point, the only thing we can offer our clients is to tell them what kind of questions to ask."

So, while the world waits for best practices, asked the experts to at least give us the answers to five questions that CIOs should consider before they tackle large-scale data processing. This advice is based on recent interviews with Evelson and his colleague, Brian Hopkins, an analyst specializing in the effect of emerging technology on IT enterprise architecture; Yvonne Genovese, a Gartner Inc. research vice president covering business applications; and her colleague, Mark Beyer, a research vice president specializing in data integration and BI.

1. Why is it so hard to mine Big Data? There are already extraordinarily sophisticated BI tools.

Despite the name, Big Data is only marginally about volume, our experts said. A petabyte of data can be loaded easily into any large, scalable data warehouse. Traditional databases, however, can't keep up with the variety of data that's characteristic of Big Data or with the intermittently high velocity at which the data is delivered, Hopkins said. The traditional BI ecosystem, "really works as long as you're thinking about the 1% of data that is available to your company. That 'big database in the sky' you've put in for enormous amounts of money can't capture the value in the other 99% of data that is dropping through the cracks," he said.

Gartner has delineated 12 "dimensions" of Big Data that CIOs should tackle one by one -- ideally, according to a plan hammered out with the business -- or risk blowing up their current systems. Solving the infrastructure issues will be a challenge for all but about one-third of companies, according to the firm, but the really hard part "is separating the news from the noise," Beyer said.

2. Is the software development cycle for large-scale data processing the same one we use for traditional BI?

"Absolutely not," Forrester's Evelson said. In traditional BI projects, business requirements come first. "You talk to business users, define the requirements, put them down on paper, architect them and implement. Big Data is mainly about the fact that I don't even know enough about what is out there to give you my requirements," he said. People need to explore. Requirements could coalesce quickly, but it's just as likely that initial theories will not be supported by the results and more exploration will be needed.

Now that the business is asking CIOs to make Big Data a component of computing operations, IT's inclination will be to "productionalize" it, by wrapping it with tenets of large-enterprise IT, such as security, scalability and disaster recovery, Evelson said. But standard operating procedures -- such as securing access to the data by role and department -- don't always apply to Big Data. "If you secure something, you can't explore it; if you can't explore it, you can't find patterns; and if you can't find the patterns, you don't know what the requirements are going to be," he said.

Putting an interesting discovery into production to test it out, as you would in traditional BI, likewise is not useful because the data being explored in Big Data -- social media, for example -- changes by the second. The exploratory analysis you run today will not confirm the one you ran yesterday, Forrester's Hopkins said. To go back and "re-operate" on the data requires technology that captures a snapshot of that raw data.

What the IT organization needs to do is to let the people who know how to do this work do it, while it supplies the technology that facilitates the work. Down the line, it can be integrated with existing IT systems, Hopkins said.

When it comes to Big Data, you also can forget about the vaunted "single version of the truth" that's sacrosanct to BI, the experts agreed. In Big Data land, there is only a single version of the facts, Gartner's Beyer said. Anomalies no longer are outliers but just another data point: "Data quality is an inherent rating of my data, not something I have to clean up."

One other point about versions of the truth: Other parts of the enterprise have competing approaches to managing and analyzing Big Data that are not beholden to IT. Operations, for example, collects data, such as meter readings, with data historian software layered with BI tools. That data represents Operations' interpretation of the truth. And when it starts rolling in, you need a governance model in place, Beyer said. "If you take that same data and put four analysts, each with a MapReduce engine and their own context of what separates noise from news -- that's when the real discussion begins."

3. Which people skills do we need to find business value in large-scale data?

According to our experts, cracking the content part of Big Data is largely a math problem that's best dealt with by professional statisticians and mathematicians. To tackle an analysis of brand reputation (a pretty straightforward project), ideally you'd have a marketing person with a doctorate in mathematics who can build a mathematical model for looking at the data. Someone who understands the sophisticated statistical mining techniques required to sift through enormous amounts of raw data for business insights is, not surprisingly, in a special class of people.

"Probably the best degree programs that produce people with these skills are the hard-core master's and Ph.D. computer science programs," Forrester's Hopkins said. And if Google, Facebook, Yahoo and Wall Street haven't snagged those graduates, "China is hiring them at enormous salaries," he added. He suggested raiding insurance companies for actuaries.

A sobering thought: "Our top recommendation is probably going to be that, if you're thinking of moving into this space, the first thing you need to do is build a team. And if you can't get the expertise to know what to do with the data, then don't bother with the technology at this point," Hopkins said.

4. Where do I begin?

With Big Data, the experts' universal advice was to start small in scope -- and maybe even with the technology. "Look at data that is specific to a problem that you are having," Gartner's Genovese said. Find the new-ish technologies that target certain use cases -- for example, Autonomy Corp.'s software for the legal industry, she advised. To address Big Data's stressors on IT systems, technology fixes should be coordinated to the business problem, Beyer said. "The CIO can start off in a reactive mode to the business needs in the short term," he said, "and in the long term consciously figure out what to roll out next on the technology side."

5. How do I define success?

Success might not be monetary, and probably will require a lot of trial and error. Success in Big Data comes from setting low expectations. Provided you have the people to do them, "try three or four projects, and don't kill yourself thinking about how to approach them," Forrester's Evelson said. "By the time you have 10 of these under your belt, you'll probably be able to discern some best practices and lessons learned."

Let us know what you think about the story; email Linda Tucci, Senior News Writer.

Dig Deeper on Enterprise business intelligence software and big data

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.