Technology breakthroughs always seem to come with a warning. The caution on big data from the experts was it's not about the data: i.e., it's pointless to collect big data unless you're going to analyze it -- and get some quantifiable business gain for your efforts. But today, making hay on big data is no longer just about the analytics. Companies chasing the next competitive edge will need to be hyperpersonal, and that will require speed and compute power.
"If you're not able to create innovations to market in a timely, quick, effective way, then all of the technology is useless," Alfred Essa said at the recent Spark Summit East in Boston, Mass.
Essa is the vice president of research and data science at McGraw-Hill Education, a 128-year-old textbook publisher that's shedding its skin to become a learning science company. It's developing digital platforms and education tools to better understand how students learn and to provide personalized lessons for students.
And it's operating in a highly competitive, if not cutthroat, space. Other textbook publishers such as Houghton Mifflin Harcourt and Pearson have skin in the game, along with unconventional competitors such as media companies and edtech startups. When a field becomes noisy, speed is a competitive differentiator.
McGraw-Hill relies on an "innovation pipeline," a three-step process that starts with the initial designs for models and algorithms and ends with product deployment, to ensure innovative products hit the market quickly. The key is rapid prototyping so that software products can be placed into the hands of customers and begin producing critical data used to iterate on the designs.
"Once the product ships, the models and algorithms are not fixed, they are first approximations," Essa said. "So we have to be able to collect data on a continuous and rapid basis to tune our models and update our models and algorithms."
Not surprisingly, Apache Spark, a big data processing engine that's touted its speed, underpins McGraw-Hill's innovation pipeline. Essa's team used it to launch StudyWise, a mobile app for med students, in eight months. And a couple of data scientists are using it to answer one of the most common questions for teachers: Can students at risk for failing or dropping out of a class be identified before it happens? That software product is entering the second stage of the innovation pipeline, product validation.
"As a machine learning exercise, it's pretty straightforward," Essa said. "But because these models have to serve and scale to millions of users, we're doing this in Spark."
Compute power is key
The drive to build hyperpersonal digital products is also happening in retail, according to Mike Gualtieri, an analyst at Forrester and a Spark Summit presenter. Businesses want to know their customers so well that they essentially function like old-fashioned corner stores. To attain such intimate knowledge of their customers, retailers will need to "learn individual characteristics of customers, learn their behaviors under certain circumstances and predict those needs in real time," he said.
That's easier said than done, even when using artificial intelligence (AI) techniques such as machine learning "to process, analyze information and scale our understanding of each and every one of our customers," Gualtieri said.
Machine learning models tend to be narrow in scope, which means companies will need a substantial portfolio if they want to provide hyperpersonal customer service. A lot of models means a lot more scale, Gualtieri said. To illustrate his point, he did some back-of-the-envelope math: If a company was interested in predicting 10 characteristics, 10 behaviors and 10 needs for every customer, it may need 30 AI models per customer.
And if a company has 25 million customers? "I know what you're thinking," Gualtieri interrupted himself. "OK, one model can apply to a segment of customers. I get it, right?" The numbers may be a little hyperbolic, but the point is if retailers want to provide intimate customer service, they're going to need more compute power -- and not just for scale.
Another wrinkle businesses will encounter is the technology to deliver hyperpersonal service quickly. Companies will have to analyze data in real time. Doing so is a drastic change compared to traditional data analytics, which are performed after the fact. "We absolutely need streaming to feed our applications and to continuously freshen these models," he said. "Again, lots of compute power."
The good news for CIOs is that enterprise-ready AI deployments are still relatively nascent. In a survey Forrester conducted last summer, interest in AI still outweighs the actual practice of it. While 58% of respondents said they're researching AI, only 19% reported that they're currently training models, according to Gualtieri. There's still time to get going.
Spark usage grows despite development issues
Spark still too 'techy' for some businesses
Spark faster, more flexible than MapReduce