BACKGROUND IMAGE: iSTOCK/GETTY IMAGES
Data lake adoption can produce numerous business benefits, but data lake implementation certainly isn't devoid of challenges, according to Sumit Sarkar, chief data evangelist at software company Progress. Sarkar spoke with SearchCIO at the recent Cloud Expo in New York, where he offered insight on data lake implementation strategies and enumerated the challenges they faced when executing a pilot project. He also sheds light on why cybersecurity considerations such as privacy, data security and being compliant with regulations such as the General Data Protection Regulation must be taken into account when feeding PII into the data lake.
Read excerpts of the interview below, or click on the player to hear the interview in its entirety.
What challenges did you face with data lake implementation?
Sumit Sarkar: The data lake was a pilot project. One of the challenges is that we are currently running the project on limited hardware. We can elevate the project to a production-ready stage only after it gets to a certain stage. I don't report into the IT group, but IT generally runs the Progress infrastructure. We don't have the buy-in from Progress yet, because we have to prove it works first. We prefer to move into the cloud, but we have sensitive PII data of customers and while CIOs and CMOs share data, they have to work together to ensure that the right governance, data privacy and security are in place.
Is there anything you would have done differently knowing what you know now?
Sarkar: With data lake implementation, I don't want to say you just dump a bunch of data into a data lake and see what happens, but that's kind of what we did. Knowing what I know now, we would have taken some measures to address things like the infrastructure; we were running on very limited things. We could have leveraged cloud computing and talked to more people in the organization, because everybody has their opinion on how they might want to use it. It would have been good to know these things in advance. But my message to folks is: don't wait; keep experimenting with the latest technologies.
What cybersecurity considerations should people take into account when it comes to data lake implementation?
Sarkar: For us, at a pilot level, we masked any PII data before we dumped it into the data lake. There are some things we tried that are valuable, that we would like to unmask, but that brings forth the data privacy issue, like what data we can store and what we cannot store. With the GDPR in Europe coming up in 2018, there are different rules.
Once we move this into production, we'll have our security architects team take a look at the data to see if it meets Progress' criteria. We also want to make sure that our CIO and CMO have visibility into what we're doing and whether it complies with the regulations.