FotolEdhar - Fotolia

Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

Architecting for big data

How do CIOs go about building enterprise architecture for big data? Start by looking beyond traditional stack vendors. The Data Mill reports.

Search and recommendation engines, such as Amazon's "customers who bought this item also bought" feature, changed how the world finds and interacts with information. But for many businesses, the technology behind the recommendation engine can often force them to make a costly tradeoff, said Scott Jarr, co-founder and chief strategy officer for the open source, in-memory database provider VoltDB.

Businesses have to choose between "timeliness versus accuracy," Jarr said during a recent O'Reilly Media webcast. Because time matters, highly personalized search results or shopping recommendations are often jettisoned for information that's good enough. It's a business problem that illustrates the distinction between big data insights and what Jarr calls fast data.

The disjunction between accurate and fast will only grow as big data gets bigger. As the Internet of Things (IoT) moves in, IT departments will face ever more infrastructure bottlenecks. Jarr said the three most common points of congestion are ingesting more and new sources of data, developing processes to quickly access that data to make data-driven decisions, and producing faster analytics for the business. Removing the roadblocks will "take fast data and start making it very smart data," he said. The problem may be that IT has simply outgrown its legacy relational database management systems (RDBMS). And turning to the big data poster child -- Hadoop/MapReduce -- won't solve the fast data problem. Technology that processes data in batch will buckle under the pressure of real-time analytics.

The Data Mill

For decades, RDBMS vendors designed these online transaction processing (OLTP) products with a kind of fast-food mentality. Michael Stonebraker, co-founder of VoltDB and a professor at MIT Computer Science and Artificial Intelligence Laboratory, calls it a one-size-fits-all concept. Stonebraker, for one, stopped believing in the approach years ago, a position laid out in a 2007 paper titled "One Size Fits All": An Idea Whose Time has Come and Gone. He could see stream-processing, sensor and semi-structured data and real-time analytics approaching, and he knew a one-size-fits-all RDBMS, which stores information on disc, couldn't hack it.

"Mike's been saying this for a decade and half: You have to purpose-build a database for those [fast data] applications," Jarr said during the webcast. That, at least, is the thinking behind VoltDB, a SQL relational database management system. It differs from traditional technology by keeping data in-memory, which increases processing speed. Jarr and Stonebraker don't pretend their solution is the big data silver bullet. In-memory technology, after all, doesn't have the capacity to act as a repository for all corporate historic data. Instead, VoltDB becomes part of a new-fangled big data analytics stack.

Small pieces loosely joined

Architecture matters. Indeed, the phrase is becoming a big data meme, as it becomes increasingly apparent not just to Jarr but everyone working in big data that the architecture of today has to look different from the architecture of yesterday. And by tomorrow, when machines of all kinds start generating data, the shortcomings of traditional database systems will be more glaring, said Mike Olson, co-founder and chief strategy officer at Cloudera, an Apache Hadoop distributor and service provider.

"It turns out machines are much better at generating data than you or I," he said at the recent MongoDB World in New York City. "It's why big data is happening; it's why industry is so quickly being transformed."

Rather than building a centralized data warehouse, a long-standing item on the CIO bucket list, Olson asked attendees to start thinking like this: "Systems will manage data where it's born." To accommodate big data and deliver business analytics, data centers need to "span firewalls," with some data living in the cloud, some living on-premises, Olson said, "and those systems will talk to each other."

Dave Weinberger's book Small Pieces Loosely Joined, makes the point, he said. "The architecture of the Internet reflects a philosophy of systems," Olson said. "Things live where they're born, and they connect." Shouldn't IT build an infrastructure that looks and acts similarly?

"I thought, early in Cloudera's life, that one single system would rule them all, but I don't think so anymore," Olson said. Cloudera is only valuable if it can "talk to, interact with, collaborate with other systems that are specialized" in the work they do. "Small pieces need to be loosely joined," he said.

And so IT "will run IoT data systems in the cloud, transactional and consumer systems internally," Olson said. "You'll build authentication and identity management systems in the places they need to be. And you will stitch them together."

Olson and Cloudera call the idea the enterprise data hub, which sounds a little like Gartner's logical data warehouse.

'One size fits none'

So are the days of the technology stack IT shop really over? Certainly, the VoltDBs, Clouderas and MongoDBs of the marketplace are making the case.

All three companies are hoping to help CIOs build agile systems that deliver fast and accurate business intelligence and analytics. They argue that the traditional one-size-fits-all data center infrastructure IT departments have used to grow the business on can't be the infrastructure that leads them forward, especially if new data streams are to be accessed and more granular analytics are to be produced.

"One size fits none," Stonebraker said during the O'Reilly Media webcast.

Welcome to The Data Mill, a weekly column devoted to all things data. Heard something newsy (or gossipy)? Email me or find me on Twitter at @TT_Nicole.

Next Steps

Will the enterprise data hub become the heart of the enterprise?

Agility marks MongoDB enterprise use cases

VoltDB ups velocity for big data apps

Dig Deeper on Enterprise business intelligence software and big data

Join the conversation

1 comment

Send me notifications when other members comment.

Please create a username to comment.

Are you using open source technology to ingest or process data?