CIO Decisions

Why data lake governance is key to modern data architecture

denisismagilov - Fotolia

Manage Learn to apply best practices and optimize your operations.

Data lake governance is crucial to big data management

Data lakes have become a major boon for companies as they seek to more effectively manage and analyze big data. But without proper data lake governance, they can easily lead to disaster.

Data lakes, which are storage repositories and processing systems like Hadoop that can hold massive amounts of raw data, have become vital to big data management as companies look to exploit ballooning big data of various shapes and sizes for actionable business intelligence.

Such companies include Colony Brands, a direct retail marketer in Monroe, Wis. According to Steve Cretney, Colony's CIO, the majority of the company's hundreds of terabytes of storage sat idle in cold storage, while only a small fraction was used for business analysis. While historically a company might have approached this discrepancy using a data warehouse, Cretney went one step further: He moved Colony's data warehousing function to the cloud -- and opened up the company's data with a data lake.

Like Colony, other companies are turning to data lakes as a cheaper and more flexible way to leverage complex data when compared with traditional data warehouses and extract, transform and load tools, say the experts that Senior News Writer Nicole Laskowski spoke with for the cover story of this month's CIO Decisions.

"Data warehouses and databases … are too expensive and too constrained by storage and performance," said Phil Shelley, adviser and director at DataMetica Solutions in India. But with tools like Hadoop, "we can bring in all of the [historical] and new data, in near-real time, without the use of legacy ETL tools, into one single place," he said. Plus, data lakes' lack of data structure offer flexibility to data scientists because they can analyze data without a predefined schema, according to Joe Caserta, founder and president at Caserta Concepts in New York.

But despite the many benefits of data lakes, they also come with risks -- including losing meaningful context or valued customers -- if they are not properly governed. Data lake governance includes not just data cataloging and indexing and metadata management, but also overall governance, such as pinning down who is in charge of ingested data and properly defining data.

Dive into our cover feature to find out how CIOs can implement the right amount of big data management and data lake governance while still leveraging the flexibility data lakes provide. Plus, learn more about the semantic database, a type of data lake that uses the same standards and technology that is used to create Internet hyperlinks.

Email Francesca Sales, site editor, or find her on Twitter @Fran_S_TT.

Article 4 of 5

Next Steps

Aligning data lakes with your enterprise data architecture strategy

Conference Notebook: Three requirements for implementing a data lake

More big data use cases will boost data lake adoption

Dig Deeper on Leadership and strategic planning