In an industry beholden to short-term results, David Saul is paid to take the long view. As chief scientist at State Street Corp., he's responsible for proposing and assessing new,
advanced technologies for the Boston-based financial institution, and for evaluating the probable success of its IT strategy.
One of the advanced technologies very much on Saul's radar these days is a database approach he contends will go a long way toward turning big data into smart data: the semantic database.
Saul's current interest in semantic technology caps a 20-year career at State Street, where he has served as chief information security officer and before that, directed enterprise architecture. An MIT graduate, he previously was with IBM's Cambridge Scientific Center.
In this first part of a two-part interview, we ask Saul to give us the basics of semantic technology, how the bank plans to use it and why not everyone is using it.
SearchCIO.com: Let's start with some basics. How does a semantic database work?
Saul: What a semantic technology does is to add meaning to the data that we already have. Think of it more as an overlay technology on the existing amounts of both structured and unstructured data that we have stored in files and databases. What semantic technology does is, it associates with every piece of data the meaning of that data, and that combination makes it very much more useful to us.
Can you give me an example of an overlay?
We have many different places where we store information that might be related to one of our clients. These systems have been developed at different times by different people. And they may have different ways of storing data. What the semantic model puts on top of that is that it says, "This information that is describing a client over here, [and] this other information describing a client over here, are that same client."
There's nothing free in this world [but] any investment you make in doing this semantic mapping, you're going to be able to leverage.
One of the ways we think this will give us a great deal of benefit is in doing risk assessment and risk management. Very often we'll be asked, either by regulators or by some of our internal risk management people, to produce a report that tells us what our total exposure is to a particular entity, which might be a company or a geographic area. In order to do that today, it's very slow and it is also very expensive. What we end up doing is [that] we'll take data feeds from multiple places that will be in different formats describing the risk information that we are trying to aggregate. A database person will build a risk repository (which is yet another database) that takes all the feeds in. Then someone has to go through and decide these two things really are equivalent to one another, and produce a series of reports.
If we do the same thing with semantic technology, we leave the data where it is. We've eliminated actually having to move the data. But with that semantic mapping, we're able to take multiple sources, aggregate them together, create either an ad hoc report or on an ongoing basis, a risk report. When it comes to something like risk, timing is very important. If we're able to get that information more quickly and anticipate a possible situation, it has the potential to prevent fairly significant financial loss in some cases.
So the data stays where it is. How do you search
a semantic database? Is it like putting a keyword into Google search?
That is actually a very good analogy. It's a good example because the technology that is being used in these semantic databases is exactly the same technology that we use today to link from one website to another website. So when you go into Google or any other website and you click on a link, the place that you're coming from really has no knowledge about the place that you are going to. It's dependent on a standard, which is the underlying HTTP that links everything together on the Internet. So, in the same way that the Internet allows you to connect two locations that have no knowledge about one another, semantic technology does the same thing, but it does it at the data level.
So, it's a search technology?
No, it is not a search technology. [Unlike in simple search] there is work involved, such as tagging each piece of data in order to do this semantic mapping. The advantage is you only do it once. When someone comes along and says "Oh, I need to add another data source" or "We have just done a merger and acquisition, and we have a whole new set of data that has to be added into it," in the old model, you're back to square one: The database people have to redesign the risk repository, for example. Any of the reports that you've done before on aggregation all have to be redone.
If you now do the same thing using semantic technology, the mapping that you have done for all the previous databases is completely valid -- you don't have to do that over. You only have to map now the semantics of this one additional source. So, any investment you make in doing this semantic mapping you're going to be able to leverage. And by the way, once you map a database semantically, you can use it not only in this risk reporting, for example, but also in any other place where you want to aggregate the data. There's nothing free in this world. There's the up-front work you have to do. But the important thing is you only do it once.
Why isn't everyone doing it?
The tools to do that mapping and store it and do the aggregation are only just now starting to become available. So, while the underlying technology of the Internet has been around for a long time and has been well-proven -- we know that it can scale to millions of websites, so we are confident that it can scale to very large numbers of data elements in semantic repositories -- we are only within the past couple of years starting to see vendors coming out with the actual tools that enable us to do these semantic mappings and to aggregate the data.
The other effort for this to really be effective across companies -- and I think in 2012 we are really seeing this come together -- is there needs to be a matching set of standards work going on. When we map a particular set of data and then we want to exchange that with one of our clients or we see that from another financial institution, we don't want to have to do a translation at that point. So, starting a couple of years ago under the coordination of an organization called the Enterprise Data Management Council, or EDM Council -- which has membership from pretty much all of the financial services organizations, as well as some of the regulators -- people have been working on some of these standards.
More about smart data
In March of this year the Enterprise Database Council, in conjunction with the Object Management Group, who have taken on responsibility for the standard-setting in this space, had the first conference devoted to semantic technology. It was in New York City, and they thought they might get 50 people to show up. They had over 200 attendees, and it wasn't just the numbers. It was the institutions that were represented and the level of person they were sending. We had people who represented chief data officers from some of the largest financial institutions in the world. And vendors were there talking about these tools. So, it is a convergence of standards, technology and awareness on the part of people like ourselves and other companies, that is really going to get benefits out of this technology.
In the second part of this interview, Saul explains the benefits, as well as the challenges, of getting semantic databases off the ground.
Let us know what you think about the story; email Linda Tucci, Senior News Writer.