Organizations looking to implement a data lake project should start exploring its potential business benefits and not wait for the technology to be perfect, according to Sumit Sarkar, chief data evangelist at Progress Software Corp. Sarkar spoke with SearchCIO at the recent Cloud Expo in New York, where he highlighted the benefits that his company has achieved from building a marketing data lake. He also shed light on how creating a marketing data lake has helped Progress gain better customer insight and offered best practices for data lake implementation.
Read excerpts of the interview below, or click on the player to hear the interview in its entirety.
You recently built a data lake. What benefits have you seen?
Sumit Sarkar: At Progress, we have data in a lot of different places. We have cloud CRM data, we have cloud marketing automation data, we have survey data, we have web [transaction] data, and each are locked behind a different API. What we did is we warehoused all of that data using our own connectivity products and were able to build a marketing data lake. We built a pilot project.
Sumit Sarkarchief data evangelist, Progress Software
I think the minds of the CIO and CMO are starting to converge in that they think, 'Let's not wait for technology to be perfect; let's start exploring.' That's what we did at Progress. We built this marketing data lake and the results are that we can do things like, let's say, after this trade show we scanned X number of leads and let's say they interact with our website. Now, we can run a little query and ask, 'What pages did they consume after coming to our talks at the show? What time did they do that? What did they interact with later?' We are taking our data from a bunch of different areas and really answering these questions at marketing speed.
But you don't have to have a predefined set of questions and reports. It's a different question depending on what you are trying to do. You can just get the data staged, get it to your subject matter experts and they can then query it and say, 'How many people who consumed this content also did this sequence of events?' It's really fun for us to ask any question we want in marketing speed.
How do you know what data to feed into that data lake?
Sarkar: I think a lot of organizations, with a data lake, are trying to dump as much as they can. We're dumping gigabytes of data in there, but we might take just a couple gigs when we ask a question. But with Hadoop technologies and big data technologies, it's cost-effective to dump as much as you want.
At Progress, we have some of this detailed activity data in our data lake that would have been too expensive for us to store in our corporate warehouse. But with the new scale-out architectures of the big data ecosystem, it's now cost-effective. We can try more things and experiment. We have acquired DataRPM as part of our cognitive strategy and we now have that sophistication where if we can give them access to our data lake, they're able to then run their own algorithm. It's exciting to just put it out there and see what happens.