This content is part of the Essential Guide: Structuring a big data strategy

Essential Guide

Browse Sections
Manage Learn to apply best practices and optimize your operations.

Data warehousing architecture gets a big data makeover

In the digital age, data warehousing architecture shies away from the monolithic enterprise data warehouse in favor of flexibility.

In the blink of an eye, it seems, all business is digital business, and no company wants to be late to the digital revolution. Businesses across industries -- not just your online giants -- are building websites, tinkering with Twitter and developing mobile applications to get a little closer to their customers and the rich data those customers are generating.

But the transition to the digital age hasn't been simple for many CIOs. IT departments that weren't born online like those of the eBays and the Amazons of the world have been challenged just to keep pace with digital data. The sheer variety of data that businesses now routinely collect -- clickstream, images, video, text -- is overwhelming traditional data management infrastructures, forcing IT departments to find ways to exist at the brink or to move beyond it. For some businesses, that has meant identifying and collecting only the most valuable data; for others that has meant introducing new technology to the data architecture.

We've had to take the approach of getting what we need and discarding the noise. There probably is value in that noise, but you have to have the right architecture to get it and to pull that value out.

Dean Wynkoop, director of data science, Cabela's Inc.

Take Cabela's Inc., the outdoor sporting goods purveyor whose outsized stores are as much tourist attraction as shopping outlet. The Sidney, Neb.-based retailer, which runs one of the world's largest direct marketing operations, is well acquainted with the challenges of digital business. For years, the company has been collecting clickstream data, which captures how visitors move through the business' website, or what the industry now calls the customer's "electronic body language." Unlike pure transactional data, clickstream includes a mix of structured or relational data as well as non-relational data such as URLs and tags, which isn't optimized for an enterprise data warehouse (EDW).

"We're forcing big data into a relational mode and put it into a data warehouse," said Dean Wynkoop, director of data science for Cabela's.

In the hopes of easing the pain and broadening its data collection, Wynkoop began working on proof of concepts in 2011 to leverage the company's EDW with new technology such as Hadoop, an open source file system that shines at handling non-relational data sources. But more than 12 months later, Cabela's has still not pulled the trigger. Today, Wynkoop and his team continue to capture data that yields the highest value to the business, letting the grainier, noisier digital data go.

"We've had to take the approach of getting what we need and discarding the noise," he said. "There probably is value in that noise, but you have to have the right architecture to get it and to pull that value out."

Cabela's is not alone in having to make hard choices as it makes the technology changes required to compete in the digital age.

Dean Wynkoop, director of data science, Cabela's Inc.Dean Wynkoop,
director of data science,
Cabela's Inc.

"We are at an inflection point in data management that's very similar to what we saw in the early '80s, when client server and relational databases changed the world," said Mark Beyer, analyst with Stamford, Conn.-based consultancy Gartner Inc.

It's an inflection point that has CIOs and IT departments feeling pressure, especially as anecdotes on how those online giants are diving ever deeper into digital data increasingly frame any big data discussion.

"We are asking people for magic," Gartner analyst Merv Adrian said. "We're telling them there's good data over there in their weblogs, and to just put it in their data warehouse. Everything's going to be great."

From first draft to fit for publication

While big data has triggered some experts to predict the death of the data warehouse altogether, a more practical discussion on extending the EDW with the use of big-data-friendly technologies is currently unfolding. Vendors such as IBM, Oracle, Microsoft and others, for example, have developed connectors to the open source Hadoop platform. The name for the new data management approach varies according to the vendor or consultant doing the talking, but the gist of the solutions are the same: ecosystems that mix traditional and non-traditional technology together.

The next 12 months

Mark Beyer, an analyst at Gartner Inc., asked during a Magic Quadrant survey if respondents planned to put big data into the data warehouse in the next 12 months:

  • 55% of respondents said no;
  • 17% of respondents said yes, but they didn't have a plan yet; and
  • 28% said they had a plan, which involved modifying the ETL process (14%) or running summaries on big data queries (9%).

Source: Gartner Inc.

While Cabela's Wynkoop envisions a system of repositories separated by how efficiently they process and store different data types, Gartner's approach takes it one step further with its logical data warehouse. The consultancy agrees that multiple data containers are important, but it also emphasizes data management and access.

"We are hearing, on an almost continuous basis, complaints about data warehouse flexibility," said Beyer. "It's actually not the fault of the data warehouse platform. It's usually the fault of the design. Just because I have a hammer doesn't mean I should make everything a nail. But that's what we did."

EDWs are extremely efficient with transactional data, but that singular focus has also hemmed in their capabilities. In a data varietal world, businesses need more. Gartner's logical data warehouse extends the power of the EDW by including data virtualization, which can virtually integrate data from disparate sources, and distributed processing, such as MapReduce, for added flexibility and exploration capabilities.

"Sometimes users just want to see the data, and you want to swim around in it for a little while -- see if you bump into anything," Beyer said. "And so what IT needs to do is give some way the users can see the data very quickly and let them figure out what the rules are about the data."

Quickly accessing data through a virtualization tool is like writing a first draft of a story or report, Beyer explained. If the draft proves to be fruitful, the results can be formally published to the EDW. In this model, metadata management plays a key role by keeping track of the details of the drafts that make the cut and those that aren't good enough to publish. Along with easy access, the new data management concept also introduces a way to push beyond the transactional data analytics businesses typically deal with and take on more non-traditional data types.

Architecture alternative

Jesse Lynch, director of IT, T-Mobile Jesse Lynch,
director of IT,

T-Mobile USA Inc. has acknowledged the value of collecting non-traditional data types with a new initiative squarely aimed at the bane of the telecommunications industry: customer churn. In this case, T-Mobile isn't going after brand new types of data but instead combing through the trove of data that is the bedrock of its business: its account memo data -- the record of all the interactions the company has with a customer -- from electronic timestamps to service representatives' notes. Like Cabela's clickstream data, account memos are a mix of relational and non-relational data structures.

"We've struggled for quite some time in getting value out of this that's meaningful [and helps explain] what happened in this experience with this customer," Jesse Lynch, director of IT development at the Bellevue, Wash., company, said at a recent Gartner conference.

Testing out a proof of concept for Teradata's Unified Data Architecture, another example of a hybrid ecosystem, T-Mobile has been able to explore accounts memo data outside of the EDW in a more flexible discovery platform. Before, T-Mobile analysts could observe how long a customer was on the line for, but now they can pinpoint where major account changes occurred, as well as zero in and dig into what Lynch called big, gnarly interactions with customers, such as the number of times a customer was transferred. That kind of data can be extrapolated out and then spread over the customer base as a whole to identify potential patterns: Are multiple transfers happening frequently? How do multiple transfers impact churn? The unified system helps the company ask -- and answer -- new questions about customer interactions.

"This experience stuff, we haven't been able to get effectively. So this really allows us to open up our eyes," said Lynch. "It's important to get those right tools in the right situation, and we know it's not the traditional set of tools we're using anymore."

Dig Deeper on Enterprise business intelligence software and big data