Hadoop is not enough! Just ask Ken Rubin, director of analytics for Facebook Inc., who delivered what CIOs probably consider a refreshing message at the Strata Conference + Hadoop World 2013 in New York: Facebook needs the relational database.
"We're a young-enough company that we started by using Hadoop as our core data technology rather than relational [databases]," he said. "As we start thinking about big data from the perspective of business needs, we're realizing that Hadoop isn't always the best tool for everything we need to do."
When a Web 2.0 superstar -- and Hadoop exemplar, at that -- says there's a time and place for relational technology, CIOs have another bit of proof, if any were needed, that building for big data isn't the black-and-white proposition some Hadoop zealots make it out to be. It's shades of gray, because what matters for the business at the end of the day is solving business problems. Thinking about big data in those terms rather than in terms of tools or architecture "opens up the possibilities of using a much broader range of technologies," Rubin said.
So when exactly does Facebook's analytics team use relational technology rather than Hadoop? That depends on what they're looking for and when and how they want to see the data. "Exploratory analysis," such as pinpointing what metrics really matter, is done in Hadoop; "operational analysis," such as slicing and dicing data, is done in a relational database, Rubin said.
Particularity matters. "If we look at the granularity of the data, we keep the lowest level of grain in our Hadoop system. So whenever you want to look at something at the lowest level of detail, Hadoop is optimized for that," he said. "However, if we want to look at transformed data and aggregated data, relational is easier for doing that."
And timing is important. All of Facebook's data streams directly into Hadoop, which can be used for real-time monitoring. But if the analytics team wants to do trending analysis over several days, weeks, months or years, "relational is a better technology," he said.
Not surprisingly, open data was a central theme at Strata Conference + Hadoop World. Shawndra Hill, assistant professor at the University of Pennsylvania, and her work on the intersection of tweets and TV was a prime example. Social television, according to Hill, is going to be "a $256 billion business by 2017."
She's looking into how Twitter can spur viewer engagement for television shows and advertisers. She's also using data sets from GetGlue and Viggle, apps that let viewers "check into" a television show the same way they would check into a location on Foursquare. Combining this kind of data with tweets might just become the next generation of Nielsen ratings.
"Can we predict customer lifetime value for shows and the network? Can we measure time shifting -- so for which shows are people checking in when the show is aired for the first time and which shows are people waiting to watch?" she said. And -- so critical to advertisers -- can it be done "at the individual level as opposed to the household level?" Stay tuned.
"You can use science and technology and statistics to figure out what the answers are, but it's still an art to figure out what the right questions are." -- Ken Rubin, director of analytics, Facebook
"If you have more eyeballs working on data, you're more likely to get better insights and better analysis." -- Michael Chui, researcher, McKinsey Global Institute
The Data Mill
Ten big data case studies in a nutshell
MetLife fires up Synapse and JSON to recruit rock-star developers
The state of the digital enterprise at Gartner Symposium
"It took Facebook around nine months to achieve the same number of subscribers/users as it took the radio community 40 years to achieve." -- David Parker, vice president of big data technologies, SAP
"How much investment is going into big data? Venture capital money, last count I saw, is about $2.6 billion. That's the equivalent of a Navy destroyer coming after your wallet." -- John Choi, director of product management, IBM
"Big data doesn't really exist. How do I know? It is a long truth in technology that anything that appears in the press in capital letters and surrounded by quotes isn't real." -- Douglas Merrill, CEO and founder, ZestFinance
"When we're talking about data science -- and big data as well -- one of the fundamental principles we should keep in mind is that data should be thought of as an asset." -- Foster Provost, professor of information systems, New York University's Stern School of Business
"Hadoop is one of the top 10 fastest growing technologies overall in terms of job growth." -- Jack Norris, chief marketing officer, MapR Technologies
(Of course, he would say that.)
This was first published in November 2013