In part one of this SearchCIO.com Trailblazer interview, "Cracking the code for big data analytics," Mok Oh, PayPal Inc.'s chief scientist, discussed the current state
of big data analytics and the enormous task of extracting useful buyer and seller information for the San Jose, Calif. electronic payments provider. In this second part of a wide-ranging discussion, he dissects the value of big data, discusses the limitations of big data tools and muses on the holy grail of big data analytics, a state he calls "Analyst 3.0."
Can you spell out how, exactly, you connect a customer and a merchant?
Oh: Let's say, Neiman Marcus has some wonderful sale and they want to make sure they spend their marketing dollars correctly -- or have more delightful or relevant information in the advertising for their consumers. We can create that list. We can target those people much better than others can.
The reason we call this "data science" is that there is a science component to it. We have great scientific leaders, we are very connected to academia, where a lot of the latest and greatest things happen.
On the data side, PayPal has awesome data -- we have transactional data. And that itself -- I am going to be very bold and bullish -- is the strongest signal from which we can start predicting people's buying behaviors. As opposed to strictly online data, this is behavioral data. People look at this, they look at that, they go to this website, that website. There are signals there that help us predict, but in commerce, transactional data is the strongest signal.
So, by plotting our buying behavior over time, the idea is that you will know more about my buying habits than I do -- that this data science taps into the subconscious of the buyer?
Oh: Absolutely. You can break that down into a couple of buckets. One is explicit shopping intentions. On the other hand, people who really want a delightful shopping experience may have intentions they are not aware of; lots of marketers, especially on the retail side, are spending tons and tons of money to optimize the layout, the traffic patterns that people are going to walk through. Lots of research has been poured into that to make sure there is a delightful experience. Basket size increases because serendipitous stuff happens: "I am looking at shoes, and by the way, here are some awesome socks, I am going to get those too."
Maybe that is 3.0! Natural languages, analog-to-digital conversions and making sure there is very little lost in translation.
chief scientist, PayPal Inc.
We are really trying to increase people's appetites or trying to delight them more, making sure that they are looking at certain things. So yes, I am trying to model the subconscious piece of that shopping experience.
This does sound a lot like what Amazon and the credit card companies are doing -- they must be trying to tap this value of big data.
Oh: It's the holy grail, yes
I certainly implore all your CIO readers to do the same. I don't think this is a big hype.
So, yes, if you generalize, yeah, we'll all do similar things, but at the end of the day, PayPal will go use case by use case. We want to make sure there is a highly efficient and strong connection between merchants and consumers.
We talk a lot of about volume, velocity and variety being characteristics of big data. For companies that don't have millions and millions of customers, like PayPal, or for companies that are more specialized, is there value in doing this type of big data analytics?
Oh: Yeah, I think so. It might be small science/small data. It always helps. A lot of startups, for example, don't have enough data to prove or disprove that their widget works. As time goes by, more and more people are realizing that data is an asset, or something that they should own and capitalize on.
With that said, it will get tougher and tougher to get data. That would be my prediction for the future, because people are going to be more and more grabby. There are going to be a lot of players popping out that are making a business model out of this need. You're a startup or you're a company that needs data? We got it -- but you have to pay for it or trade for it. There are tons of those companies that are data providers out there, so companies can certainly leverage that as well.
Read more SearchCIO.com stories on big data
In the coming data wars, many companies will be left out in the cold
Get cracking on big data and analytics, or go home
Companies will rise and fall on their ability to use big data analytics, but human foibles make success anything but certain.
But again, common sense should prevail. If you're in the business of selling tacos, the strongest signal is going to be, who bought tacos? Even if there is tons of data out there, there are very obvious things to look at. People who bought tacos probably will continue to buy tacos, and I need to find like-people who will also buy my tacos.
I think the patterns going forward will be that people will leverage data as an asset. They are going to invest more in data, and I am sure there will be more walled gardens going up, just looking at Apple and Facebook. Those are walled gardens, not just because of PII [personally identifiable information], but because they know their data is very valuable for them.
You can be very creative in how you get your data -- you can acquire it yourself, buy it, make deals with others.
Since you are in charge of all things "science-y," what problem are you thinking through right now that is related to your science-y duties?
Oh: Overall, I think a lot of people see big data as a structured-versus-unstructured-data problem. I have been thinking about this for a while now. One of the biggest assets will be understanding unstructured data out there, because I will argue that 99.9% of the information that we have is unstructured, meaning computers can't put it into a table, can't sort it into a table. These are things like email or tweets or reviews or blogs.
With "Likes," they're structured: You either like it or you don’t. Whereas when you say, "I love the taco truck, I wish they had a location here," that's very, very valuable information. If somebody tweets that, I want to understand it: I want to know who said that, I want to know who are the merchants who can benefit from this as well, so we can make sure we connect those two together. Again, 99.9% of the data out there is something that computers cannot easily understand.
So the problem then becomes, how do we make sure that they can? If there are images and videos and email, audio, songs, there is no real way to understand beyond the category. We need to find a better way to understand that data, especially with the explosion of information out there. YouTube gets, what, 50 to 60 hours of video every minute! For every minute, you have over a couple of days' worth of time. That's ridiculous. How do you know what is relevant to you? Similarly, we are learning a lot more about merchants and the things they say. I call them "digital artifacts" or "digital inferences" -- there has got to be ways to structure that output and leverage that for all our customers. That's the bigger pie in the sky -- maybe that is 3.0! Natural languages, analog-to-digital conversions and making sure there is very little lost in translation.