Big data has become part of the business lexicon, but that doesn't mean it's easy to do. Part one of this expert tip looked at how CIOs can help their departments break out of traditional IT thinking to get a big data project off the ground. Part two is a shortlist of big data management recommendations for keeping the project on track. Bottom line: CIOs must recognize their role as information stewards.
"Following principles of information stewardship encourages you to take a proactive approach to … managing the data through its entire lifecycle -- from its acquisition to what you decide to do with it at end of life," said John Burke, CIO and principal research analyst for the Mokena, Ill.-based Nemertes Research.
Burke breaks information stewardship into five categories: information protection, data quality management, disaster resistance, information lifecycle management and compliance.
The advice came out of a three-hour course on the topic of managing big data given by Burke and his colleague Johna Till Johnson, president and founder of the consultancy.
1. Remember that quality is a continuum in big data
While big data's definition continues to evolve, experts still point to data growing in volume, variety and velocity as key characteristics. Burke and Johnson push businesses to think beyond the original three V's and consider the data's potential value to the business. This requires breaking from traditional rules about data quality.
"The old mind-set of [data quality management] is that you want all of the data as pure as the driven snow, and if it's not, it's irredeemably flawed," said Johnson. "With big data, you have to start thinking in terms of a continuum."
In some cases, the importance of the data will lie not in its pristine quality but in the kinds of questions it can help answer. That is certainly true of the opinions found in social media data, which often cannot be verified, yet still can provide value to the business, she said.
2. Don't be wedded to stack vendors
A lot of the rest of what goes on in lifecycle management for data hinges on classifying stuff as you acquire it.
Although stack vendors want you to believe otherwise, it's still the wild, wild West when it comes to big data technology, Johnson said. What the Oracles and IBMs of the world are offering won't measure up to business need, in her experience.
"You are trusting them to know your business requirements, and I can guarantee you they do not," she said. "We are probably five years away from the point when you can raise your hand and get a whole suite and framework that's tightly integrated."
Instead, Johnson suggests having "owners" for every stage of the lifecycle framework -- data acquisition, classification, management, analysis, storage, end-of-life and security -- and having them perform an "a priori analysis" to figure out where the needs are. Fair warning: That's going to produce "a giant hairball" because owners of acquisition, for example, will likely be drawn to different vendors and tools than owners of classification and management, she said.
"You're going to find that the answers may drive point products or point solutions or combinations of point products and point solutions that you then invest in integrating to get this whole lifecycle," she said.
3. Integrate, integrate, integrate
Point solutions will force your big data team to think integration. Open source tools, such as Hadoop, present yet another challenge to managing big data. While it's possible to take on big data using only open source tools, Johnson issues a reminder to be on the lookout "for that part where open source bites you in the butt" because of the lack of support and the stickiness of stitching these things together.
"We went to an entirely open source infrastructure to one that's largely based on commercial products," she said. "Even though we're paying more upfront for license fees than we were previously, we're paying less in support and integration and stuff breaks less often.
Still, she said, until stack vendors have a handle on managing big data, companies should expect to participate in the "delicate dance" between source tools and commercial vendors.
4. Tag, tag, tag
When embarking on information lifecycle management, give proper weight to the first step: classification of the data, said Burke. That's where things like quality checks, security parameters and other tags, such as location, are tied to data acquisitions.
More on big data
Exploring the big data frontier
Data warehousing architecture gets a big data makeover
Big data veteran talks about big data infrastructure
"A lot of the rest of what goes on in lifecycle management for data hinges on classifying stuff as you acquire it," he said.
This is also where IT needs to reach across the aisle and work with the business. "Classification is one of those things where IT can and should do some of it, but can't do a lot of it because IT is not the owner or the user of the data," Burke said. "This is one of those places where having the cross-functional team and having the involvement of folks outside of IT on that team really comes into play."