Big data tutorial: Everything you need to know
A comprehensive collection of articles, videos and more, hand-picked by our editors
Despite the fact that there is no commonly accepted definition of big data, there is growing evidence that, when approached in the proper manner, there is big business value in big data.
By submitting your personal information, you agree that TechTarget and its partners may contact you regarding relevant content, products and special offers.
Industry publications are filled with articles that deliberate, sometimes even pontificate, around whether big data is hype or reality. The good news is that the trend is toward case studies that demonstrate how leveraging big data has created significant business value. This is somewhat encouraging, but acceptance of big data still falls short, as does understanding and institutionalized concepts, facilities, tools and methodologies that we have come to expect from our "big" technologies.
Some people talk about big data in terms of size -- e.g., petabytes (1,024 terabytes), exabytes (1,024 petabytes) and zettabytes (1,024 exabytes). To put these terms into a context that we mere mortals can understand, an exabyte is equal to two to the 60th power (that is, 1,152,921,504,606,846,976) bytes.
Still not clear? Can't imagine why. Here's a comparison that is perhaps a bit more comprehensible. If you were to stack an exabyte of regular Oreo cookies (not the Double Stuf kind) into a neat tower of pleasure, the height of the tower would be the equivalent of 19,041,819 round trips to the moon. Still unimaginable? How about 48,938 round trips to the sun?
According to IDC's 2012 Digital Universe Study:
- In 2012, only 0.5% of the world's data was being analyzed;
- In 2012, 2.8 zettabytes of data will have been created and replicated;
- Projected growth in data is largely attributed to the worldwide proliferation of PCs, smartphones and the Internet, especially in emerging markets; and
- Data from machines, such as surveillance cameras and smart meters, has contributed to the doubling of the digital universe within the past two years alone.
The IDC study offers the following predictions for where big data will be by 2020:
- The digital universe will reach 40 zettabytes, a 50-fold growth from the beginning of 2010;
- 40 zettabytes will be 5,247 gigabytes per person worldwide;
- Emerging markets will supplant the developed world as the main producer of the world's data; and
- The investment in spending on IT hardware, software, services, telecommunications and staff that could be considered the "infrastructure" of the digital universe will grow by 40%. Investment in targeted areas like storage management, security, big data and cloud computing will grow considerably faster.
Whether you believe these numbers or not, there seems to be an emerging, inescapable conclusion: Size does matter. But even considering that we are living in the age of 16-plus-ounce soft drinks, 2,700-plus-foot skyscrapers, 7-plus-pound tomatoes and 8-plus-foot tall people, size alone does not seem to account for what's so big about big data.
According to a recent study performed by the TechAmerica Foundation, Demystifying Big Data: A Practical Guide to Transforming the Business of Government, big data is defined by "the rapid acceleration in the expanding volume of high velocity, complex and diverse types of data." Here, we find reference to factors beyond size, including speed, complexity and diversity of data.
The TechAmerica study tells us that 15% of the information in existence today is structured -- i.e., the traditional data fields within records within data files, and rows and columns within relational databases and spreadsheets. That means that 85% of information in existence is unstructured -- i.e., contained within social media sites, recorded conversations, videos and email. Understanding the nature and meaning of unstructured information presents significant challenges that far exceed the capabilities of typical business intelligence tools that were, for the most part, designed and built for handling the 15% that is structured.
Even a casual review of any respectable business or technology publication makes it obvious that much of the expected growth in information will come from mobile devices, sensor-based devices and social media, making it likely that the 15% will shrink by comparison as the 85% grows, creating more and more diversity and complexity.
But the real issue here is not about the technologies and the big data. It is rather about the notion that these technologies -- and all of the big data that they are generating -- are also enabling and driving fundamental shifts in the way we work, play and generally interact with each other. Being constantly connected almost demands real-time interaction models. It means that "it's in the mail" is no longer an acceptable response. I've sent you a text message and I expect an immediate response. As my 20-year-old son likes to remind me on a regular basis, "C'mon Dad, email is for old guys."
Five tips to jumpstart big data's business value
Business models used to be based on looking at historical data to determine what to do during the next 12 to 24 months. Business models are now based on looking at what happened during the past few minutes (or seconds) to determine what to do during the next 12 to 24 minutes (or seconds). Marketing used to be based on abstracting segments or population samples in order to predict the propensity or responsiveness of individuals within the segment to targeted products and services offered in campaigns that lasted weeks or months. Marketing is now based upon the analysis of each individual's behavioral and experiential information and providing tailored offers to that individual in real time at the point of contact -- call center, website, mobile application, and so on. Conceivably, no two individuals will ever receive the same offer, and once an individual receives an offer, that same offer may never be repeated again. This is the big deal -- it's the real "big" in big data.
In thinking through the implications of this big data stuff and what all this means to us as IT executives and to the enterprises that we serve, a few important considerations and next steps become apparent:
- If you're not already, engage your leadership in conversations around what is happening and why it's important to the enterprise, to your shareholders and to your customers;
- Support your leadership in evolving their enterprise business strategy to adapt to and exploit these new business models and technologies and the big data capabilities that enable them;
- Ensure that your information and data strategies and governance processes are aligned with your business strategies and models;
- Manage these efforts as you would an agenda that supports innovation -- small, short-term, incremental efforts that are manageable and that yield measurable and meaningful business results; and
- Accept the fact that there will be failures along the way. Learn to recognize and learn from them so that the probability of success increases with subsequent iterations.
If, for whatever reason, you or your enterprise remain unconvinced about big data's potential, perhaps you might want to consider the following from the recent book The Human Face of Big Data by Rick Smolan and Jennifer Erwitt: "During the first day of a baby's life, the amount of data generated by humanity is equivalent to 70 times the information contained in the Library of Congress." One can't imagine how many Oreos that would be.
Harvey Koeppel is the president of Pictographics Inc., a management and technology advisory and consulting services firm. He is also vice chairman of the World BPO/ITO Forum. From May 2004 through June 2007, Koeppel served as the CIO and senior vice president of Citigroup's Global Consumer Group. Write to him at email@example.com.
Film raises questions about big data's role in business