Big data tutorial: Everything you need to know
A comprehensive collection of articles, videos and more, hand-picked by our editors
Many businesses now have an understanding of what constitutes big data; making a success of big data is another matter. Just ask Doug Laney, an analyst with Gartner Inc.; Mike Gualtieri, an analyst with Forrester Research Inc.; and Robert Morison, lead faculty member at the International Institute for Analytics -- three big data experts whose work gives them a view into how companies are using big data. Here are the factors they see as contributing to big data success -- and the ones that can spell big data failure.
Big data do's
Do start small
CIOs have heard this advice before, but what does starting small mean? "It means starting with a business domain where you sense an opportunity to move the proverbial needle in performance and an opportunity to learn by pulling in more data," said the Institute's Morison.
He pointed to a pharmaceutical manufacturing company that wanted to improve its product yield by just 1% to 2%. With traditional BI tools, it could analyze a limited amount of its manufacturing history, resulting in limited insight into where the process might be tweaked to up production. The company wondered if analyzing more data could help pinpoint the real drivers of manufacturing performance, and procured open source, Hadoop-related technology that enabled the company to load up three years of production history in a matter of weeks.
"Shortly thereafter, they're developing heat maps of the combinations of variables -- in this case, things like pressure, temperature, agitation and timing -- that can lead to better product yield," Morison said. "So, in a matter of months, they went from exploring what they could do if they looked at more data to actually launching experiments on their manufacturing facility to improve their yield."
It's time for CIOs and business leaders to deviate from the consumption- and goal-oriented project management style of traditional IT, Morison said. Instead, encourage experimentation and creative thinking. In his pharmaceutical manufacturing example above, "the objective was to learn as they could and improve as they go," he said. "What's really great about this application, once they started doing it, every new manufacturing batch becomes part of the database. They've got a constant feedback loop. It's a little bit of an experiment to make things better and better."
Gartner's Laney said experimentation should include "integrating data sources that perhaps don't naturally go together." Retailers, for example, are analyzing security camera feeds "to understand the flow of their customers in stores," giving them a chance to identify shopping profiles and shopping patterns, he said.
Do pull the trigger on Hadoop
Big data isn't just Hadoop, "but Hadoop is a big catalyst for it" because it's cheap and easily accessible, Forrester's Gualtieri said. For many of the companies seeing success with big data, Hadoop is somewhere in the background. "Adopt Hadoop. Make that your experimental platform for your data because you can get all of the data together relatively cost effectively," he said.
Do leverage dark data
Laney refers to the corporate data that's stored and never seen from or heard from again as "dark data," and he encourages CIOs to consider the wealth of possibilities they're sitting on. Some businesses already are. Insurers, for example, are running text mining tools over old adjuster reports to better understand fraud or trends in the insurance business, Laney said.
Plus, exposing dark data to the light of day could lead to new, worthwhile revenue streams. Dollar General pays for its enterprise data warehouse by sharing consumer packaged goods data with clients, Laney said. And software-as-a-service provider Clothes Horse, a startup that's helping online shoppers determine the perfect fit, is analyzing its customer data to give retailers more visibility into customer preference. New platforms are also cropping up to help distribute and sell data from a range of vendors, Laney said, including: Microsoft; ProgrammableWeb, acquired by MuleSoft in 2013; Data Market, acquired by QlikTech last fall; and qDatum, a startup based in Germany.
Big data don'ts
Don't give into the R craze
While the open source programming language R is commonly associated with data science, CIOs don't need to hire data scientists who know R to jump-start an advanced analytics program. Off-the-shelf software will get companies pretty far. Just as CIOs wouldn't ask a Java developer to program a business intelligence report these days, the same holds true for advanced analytics, according to Gualtieri. Tools from Alpine Data Labs, Alteryx, SAS, RapidMiner and KNIME are mature enough to do about 80% of the predictive analytics jobs without having to build everything from scratch, he said.
Don't just report on the data
Pushing past traditional analytics is one of the biggest differentiators between businesses that are making big data work for them and those that aren't. "This goes beyond pie charts and bar charts," Gartner's Laney said. "Start integrating data into business processes -- and not just reporting on the data." Gualtieri also sees advanced analytics as a differentiator. "Can you do more traditional reporting and better reporting with big data? Yes, but that's just more of the same. The competitive differentiator is when you actually create predictive models on that data," he said. Unfortunately, along with a dearth of data scientists, Gualtieri said the imagination to push beyond traditional analytics is in short supply.
Don't think analytics will automatically be adopted
Morison said an analytics pitfall he sees frequently is that "reasonably good analytics are done, but not adopted." Avoid the snag by working closely with the business, he said, a tip that was reinforced for him in recent conversations with a couple of chief analytics officers: "They have business partners every step of the way or they don't start, even when they see something worth doing," he said.
Previously on The Data Mill
Systems diagrams help leaders manage change
The top five Data Mills from 2014
A look at Booz Allen's innovation blueprint