Structuring a big data strategy
A comprehensive collection of articles, videos and more, hand-picked by our editors
In the first part of this interview, State Street Corp. Chief Scientist David Saul gave us a primer on semantic technology. Here, he talks about the challenges and tradeoffs of getting a semantic database off the ground, and how semantic technology could produce really big benefits for many companies, thanks to his and others' efforts now under way to develop standards.
SearchCIO.com: How long have you been interested in leveraging a semantic database?
Saul: My interest in the underlying technology goes back to the very early days of the Internet -- and this was before I came to State Street -- when I was working with universities.
But it is only probably the last year and a half [that] I've seen that there are some practical tools that we could actually use to build some proof of concepts, do some pilots and get started.
What were some of the challenges you've experienced in getting this going?
The first challenge I had -- and it is one that I actually enjoy and is part of my job -- was just going around and talking to people about what is this semantic technology. Is it real? Is it something that can actually be exploited? Like any new thing, there is a bunch of terminology -- I might even say jargon. In semantic technology, the benefits are most directly realized by the end user. So, I found myself talking to people on the business end, asking them about things like this risk aggregation problem. What is it that they really need to do?
They would describe to me, "Well, we have these situations coming along all the time where we have to put together lots of different information, and we have to put it together very, very quickly. The traditional model -- going to the IT organization and having them lay out a large project and get back to us in months -- is not what we are looking for. We need to be able to do this in a much shorter period of time."
What tools are you using to do this?
There are actually a number of companies who are specifically working on providing semantic-technology end user tools. A lot of them are working in financial services, because that is a pretty obvious space. Some of them are also working in the pharmaceutical industry. Think about the problem they have with new drugs and the large amounts of data which may be mapped in different ways. Overlaying semantics on top of that means that they can more quickly do the analysis. You actually can't go to a biotech conference these days and not hear something about semantics.
I really like the term 'smart data' because it talks not just about the quantity, it talks about the intelligence of the data.
Who actually built the semantic overlay at State Street?
There is a new set of jobs which are slightly different from the traditional jobs in the data space. So companies, ourselves included, are creating jobs like data scientist, someone who is able to bridge between the business and the technology to help people do these semantic layouts.
Is it hard to find a person with that combination of technology expertise and understanding of the business?
That is probably one of the things that are holding us back on moving more quickly on deploying this technology.
So, with your risk repository, once you have it and start building it out, who has access to it, and how is it manipulated?
That's a great question, because with the risk repository, you now have to put in a higher degree of security and controls since you have now aggregated more information. If that data were breached, because you have more things in one place, it could potentially be more serious.
One of the other things that semantics provides is that you have now tagged each element of information with its meaning. Part of that meaning is what kind of access controls you want to place on it. So, now you're not only looking at what this data represents, but that tag also has the potential to include the source. It might be information that came from an external source, so it is tagged with that as well.
Would it make Sarbanes-Oxley compliance potentially easier?
It has the potential to do that. Again, you've got to go back through and do that semantic mapping. Just to give you a balance, what I have been describing does require extra storage and it does require extra processing, so it is not free. You now can combine it with some of the other technologies that are coming along that are lowering the base cost of computing -- things like the cloud or data virtualization. But again, it goes back to your earlier question, "Why are we only seeing this now?" Well, there is a cost associated with it. The tradeoff of additional processing and storage versus being able to aggregate this information more quickly is a tradeoff we'll make anytime.
Where do you see semantic technology going for the bank, and where do you think it is heading in general? I just read about a company called Factual, which wants to catalog every fact in the world. It's hard to imagine.
What you are identifying is the absolute explosion of data that is being created, big data. Big data is just a statement about quantity. We all know about quantity, and there are some tools that have been developed to address that. A lot of them are being used in the consumer space to analyze things like spending patterns. I am all for doing that and reducing the amount of data to a more useful level.
More about data and analytics
The term I like -- and I didn't coin this term; it was done by some financial colleagues also working in semantics -- is smart data. I really like that term because it talks not just about the quantity, it talks about the intelligence of the data; that really is what semantics is about. I think some people might be scared away by the term semantics and might find smart data more attractive. Several of us got together recently in Cambridge to form a special interest group around smart data.
What remains to be done on semantic technology?
Clearly we will get immediate benefits internally, but the larger benefits will be if we can get this adopted by our clients, by other partners in financial services, under a standard structure. The EDM Council [Enterprise Data Management Council] has a draft proposed standard, which is documented in semantic terms for over-the-counter derivatives. This maps -- independent of the organization -- all of the elements, the meanings, the relationships with over-the-counter derivatives. This is being proposed as a standard for regulatory reporting under the Dodd-Frank Act for that space. If something like that is adopted and then is accepted by the regulators, I think you're going to see this technology take off very, very quickly.
Let us know what you think about the story; email Linda Tucci, Senior News Writer.