BOSTON -- A blind man in a dark room is looking for his black cat, and he can't find it. He calls in a sighted person for help. He can't find the cat either but is more confounded than the owner. Because the room is dark and the cat is black, the sighted person can't presume the cat isn't in the room.
Anthony Scriffignano and his data science team at Dun & Bradstreet Inc. work on problems like this all the time: They search for data that is elusive -- maybe hiding in plain sight or not there at all. Scriffignano, senior vice president and chief data scientist at the financial services company, calls them black cat problems.
"It's a term I made up," he said in an interview with SearchCIO at the recent AI World. "You have to do a lot of that when you're in this space" he quips, "because a lot of what we're talking about, the nouns and the verbs don't have names yet."
Black cat problems aren't for the faint of heart -- you have to accept that the black cat may not be in the room. (Indeed, Scriffignano said that the first step in solving a black cat problem is to stop whining.) In data exploration, these are problems of undefined shape and size -- new types of fraudulent activity, what the next big customer will look like -- and require a test-and-learn mindset to systematically work through them. They can be proactive explorations of data, such as seeking out new criminal behavior (think installing smoke detectors, Scriffignano said), or reactive explorations of data, such as investigating whether an event triggered changes in behavior. In either case, there may be no there, there in the end, he said.
One of the recurring black cat problems for Scriffignano's team is sussing out nefarious activity such as identity theft. Scriffignano said that it's important to first establish a definition of what identity theft is so the data science team has a baseline. The team then uses different tools to, in part, classify the data, segment the data and build graphic representations, which Scriffignano said "is a big part of this."
Fraudsters tend to interact with other fraudsters and with certain types of customers (those perceived as easy marks, for example), and they tend to repeat the same behavior as they go from one victim to the next. Graphs can chart the relationships and interactions of a network. An analysis of the network can uncover new patterns or identify anisotropic regions -- a term borrowed from the field of biology to mean a cluster of unusual relationships and behaviors, Scriffignano said.
But identifying an anisotropic region doesn't automatically mean the discovery of fraud. "The tricky part is when you find it, you're not done," he said. The behavior may not be nefarious, but instead some type of new behavior that hasn't been seen before. The results have to be disambiguated to make sense out of them. And that requires more hypotheses and more testing, Scriffignano said.
Before any action is taken, the data science team turns the results over to skilled experts to make a final determination.
"The state of the art in most cases for the type of malfeasance that we look for is to reduce the complexity of the problem to the point where some really skilled people can finish the task," he said. Or not.
Deloitte's five vectors of progress
It's still early days for AI, but consultants at Deloitte believe barriers to entry are beginning to fade. They've compiled "five vectors of progress" in AI tech that could accelerate adoption and push it into the mainstream. The five vectors are as follows:
- Automating the data science process. Much of what data scientists do is "grunt work," David Schatsky, managing director at Deloitte, said at AI World. They spend a big swath of time preparing the data they want to analyze. Today, tools on the market are automating many of those steps, making data scientists more efficient and giving companies a chance to run more experiments in the same time period, Schatsky said.
- Reducing the need for training data. One of the drawbacks to machine learning is the amount of labeled training data needed to get a model working. "Some companies don't have it, can't get enough or it's proprietary and there are various constraints on it," Schatsky said. But techniques are emerging that can help companies overcome data scarcity. One is called synthetic data, which is data "generated algorithmically to mimic the characteristics of the real data," according to "Machine learning and the five vectors or progress," an article co-written by Schatsky. Another technique is known as transfer learning, which uses AI to apply learning from one data set to a new domain.
- Accelerating training. The computation process needed to train a machine learning model can take hours, days and sometimes weeks to run just to see if the model's any good. Improvements to the hardware that underpin how models are trained are enabling engineers to "do things in parallel that will close the loop more quickly," Schatsky said.
- Explaining results. Machine learning algorithms operate in a so-called black box: How they arrive at the conclusions they do is unknown. It's a turn-off to managers in regulated industries or to those who oversee a sensitive area of the business. But, according to Schatsky, the black box problem is "being tackled step by step."
- Deploying locally. Soon, machine learning will be deployed on the edge in mobile telephones and internet of things devices due to compact models that require relatively little memory and "a whole new generation of low power chips," Schatsky said.
A call to action at AI World
"It's exciting to see the level of progress we've made in the field, but I want to reiterate one more time: Is that enough? This is a very important time, and the stakes are higher than ever before in terms of the field of AI and the promise that it holds for the future.
"There are really two outcomes: Either AI will live up to the hype and expectations, or it won't, and it will fail. And I believe everyone in this room -- we're all stakeholders in this world -- would join me in wanting AI to be successful. And if we want AI to be successful, we have to separate hype from reality; we need to understand how these algorithms operate and the constraints that they're subjected to." -- Tolga Kurtoglu, CEO, Palo Alto Research Center Inc.