- Share this item with your network:
- Download
CIO Decisions
- FeatureData lake governance: A big data do or die
- FeatureDow Jones' business innovation process: Set goals, measure success
- FeatureDow Jones' innovation projects: Idea generation, execution
- OpinionData lake governance is crucial to big data management
- OpinionAttention, CIOs and other humans: Advances in robotics will reshape society

denisismagilov - Fotolia
Data lake governance is crucial to big data management
Data lakes have become a major boon for companies as they seek to more effectively manage and analyze big data. But without proper data lake governance, they can easily lead to disaster.
Data lakes, which are storage repositories and processing systems like Hadoop that can hold massive amounts of raw data, have become vital to big data management as companies look to exploit ballooning big data of various shapes and sizes for actionable business intelligence.
Such companies include Colony Brands, a direct retail marketer in Monroe, Wis. According to Steve Cretney, Colony's CIO, the majority of the company's hundreds of terabytes of storage sat idle in cold storage, while only a small fraction was used for business analysis. While historically a company might have approached this discrepancy using a data warehouse, Cretney went one step further: He moved Colony's data warehousing function to the cloud -- and opened up the company's data with a data lake.
Like Colony, other companies are turning to data lakes as a cheaper and more flexible way to leverage complex data when compared with traditional data warehouses and extract, transform and load tools, say the experts that Senior News Writer Nicole Laskowski spoke with for the cover story of this month's CIO Decisions.
"Data warehouses and databases … are too expensive and too constrained by storage and performance," said Phil Shelley, adviser and director at DataMetica Solutions in India. But with tools like Hadoop, "we can bring in all of the [historical] and new data, in near-real time, without the use of legacy ETL tools, into one single place," he said. Plus, data lakes' lack of data structure offer flexibility to data scientists because they can analyze data without a predefined schema, according to Joe Caserta, founder and president at Caserta Concepts in New York.
But despite the many benefits of data lakes, they also come with risks -- including losing meaningful context or valued customers -- if they are not properly governed. Data lake governance includes not just data cataloging and indexing and metadata management, but also overall governance, such as pinning down who is in charge of ingested data and properly defining data.
Dive into our cover feature to find out how CIOs can implement the right amount of big data management and data lake governance while still leveraging the flexibility data lakes provide. Plus, learn more about the semantic database, a type of data lake that uses the same standards and technology that is used to create Internet hyperlinks.
Email Francesca Sales, site editor, or find her on Twitter @Fran_S_TT.
Next Steps
Aligning data lakes with your enterprise data architecture strategy
Conference Notebook: Three requirements for implementing a data lake
More big data use cases will boost data lake adoption
Dig Deeper on Leadership and strategic planning
-
Data lake storage: Cloud vs on-premise data lakes
-
Dremio accelerates cloud data lake queries for AWS
-
Calmer waters promised in the data lake through Linux Foundation Delta Lake Project
-
Databricks contributes Delta Lake to the Linux Foundation