What technical challenges face voice AI technology? There are many

Conversational AI will change enterprise communications -- eventually. In the meantime, Gartner analyst Svetlana Sicular details the areas where voice AI technology needs to grow.

Brian Holak

Published: 29 Mar 2018

At the Gartner Data and Analytics Summit, analyst Svetlana Sicular made the case that enterprise communications of the future will become more verbal. But it won't be because of a surge in face-to-face conversations or chats over the phone.

Voice AI devices -- with assistance from the likes of Amazon Alexa, Microsoft's Cortana, Apple's Siri and Google Assistant -- will be the medium through which we talk to each other and to our software.

Sicular advised CIOs and other digital leaders to leave the competitive technology innovation in the voice AI field to the major players -- Amazon, Microsoft, Google and Apple, to name the biggies -- and to use these tech titans' platforms as foundations for their own business cases.

That should be relief for enterprise IT leaders because, as it turns out, there is still a mountain of technical challenges to overcome when it comes to voice AI, also called conversational AI. As Sicular said, "Think of this time as the time of silent movies. Color and sound are coming, but they're not yet here."

Gartner analyst Svetlana Sicular at the Gartner Data and Analytics Summit in Grapevine, Texas.

For starters, the responses that voice AI systems give ideally should be nuanced -- we humans should feel as if we're speaking with a fellow human, or nearly so. But right now -- as is clear to anyone with a smart speaker or an assistant on their phone -- conversational AI is a command line, not an actual conversation.

"When you dictate something to Siri, it doesn't mean that Siri understands," Sicular said.

Instead, Siri and other AI assistants are simply transferring that input through different -- and separate -- levels of capability. , these assistants have to convert the input speech to text. Text understanding -- grasping the intent and what people are actually asking -- is then a separate step. The final step is response generation, also known as text-to-speech. These steps need to become more streamlined, integrated and intelligent, Sicular said, for real conversation.

Think of this time as the time of silent movies. Color and sound are coming, but they're not yet here.

Svetlana Sicularanalyst, Gartner

Another hurdle to overcome is context. Voice AI systems still need to learn how to be context-aware, which means that they should respond differently depending on what device a human is speaking to it on, where a human is, what time it is and so on. Asking the same question, therefore, could elicit different responses depending on contextual factors. This requires time-sensitivity, geolocational awareness and integration with other real-time metrics, she said.

Sicular said voice AI technology also needs to get better at dealing with ambiguity and variability, which tie back to the need for greater contextual awareness. An example of ambiguity is a user telling a voice AI assistant "Wake me up at 10." The user shouldn't have to specify a.m. or p.m.; the assistant should determine that on its own based on time of day. In terms of variability, there are many different ways to say the same thing -- a simple example being "nearby," "close by" and "close to me" -- and voice AI assistants need to be familiar with all of them.

The technology also must get better at "start, stop and continue" points, or knowing when to start listening, stop listening and continue listening, Sicular said. When there is a long pause between words or in the middle of a request, voice AI assistants need to know the best way to react.

One positive note from Sicular: As of last year, Alexa and Google Assistant have the ability to recognize different voices, which is a big step forward. Unfortunately, they still fall short on providing truly individualized experiences. In general, Sicular thinks customization is lacking in voice AI systems. She said companies need to offer customers the ability to choose an AI assistant's personality and a name.

Based on Sicular's long list of technology hurdles facing voice AI, being able to name an AI assistant might be low priority. But technical improvements in conversational AI, along with a multitude of assistant names, will almost certainly come sooner than we think. Gartner predicted that by 2021, AI augmentation will create $2.9 trillion of business value. "That's a Hollywood blockbuster, not a silent movie," Sicular said.

Next Steps

How AI helps contact center agents tackle customer questions

What technical challenges face voice AI technology? There are many

Conversational AI will change enterprise communications -- eventually. In the meantime, Gartner analyst Svetlana Sicular details the areas where voice AI technology needs to grow.

Next Steps

Dig Deeper on Digital transformation

virtual assistant (AI assistant)

voice recognition (speaker recognition)

3 use cases highlight AI and speech recognition evolution

Tips for implementing voice technology in the enterprise

Next Steps

Related Resources

Dig Deeper on Digital transformation

virtual assistant (AI assistant)

voice recognition (speaker recognition)

3 use cases highlight AI and speech recognition evolution

Tips for implementing voice technology in the enterprise