News Stay informed about the latest enterprise technology news and product updates.

To work for us, AI virtual assistants need to speak, see like humans

The race is on to support conversational technologies and make AI virtual assistants that we trust to do things for us, said Julie Ask, principal analyst at Forrester Research, during a keynote presentation at the research firm’s recent New Tech Forum in Boston.

But the only real contenders in the race — at least right now — are the tech giants of the world, according to Ask. Google, Facebook, Amazon and Apple already broker a lot of consumer interactions, but they’re far from done. Each of them wants to become the platform of the future that completely “owns consumers’ moments” by having an AI virtual assistant that can “do everything,” Ask said.

Some of the conversational technologies needed to support that ambition are ready today, but others are going to take time, she said. Intelligent assistants, as they’re also referred to, do a fine job at telling you the weather or ordering you more paper towels, but those interactions are far from true conversations.

“For these services to really be valuable and fundamentally more convenient than just picking up a phone to talk to someone or opening up an app to get something done, these conversations really need to evolve,” Ask said.

The good news is that consumers are ready for conversational technologies. There are already three times as many people using voice assistants on their phones and  smart speakers than a couple of years ago, said Ask. There are also 15 million smart speakers — which include Amazon Echo and Google Home — in U.S. households today, she added.

However, two advanced conversational technologies still need to evolve to make intelligent assistants more productive: natural language generation and image recognition.

There’s a lot that goes into natural language generation, or the ability for machines to speak and sound like a human being. It takes human babies years of failure, success, repetition and refinement to develop language skills, Ask pointed out. She believes it will take as long, if not longer, for machines to hone their speaking skills.

“It’s going to take years — if not tens of years — to incorporate all of the data [necessary] to really begin to generate [human-like] language, sentences that have inflection and so forth,” said Ask.

The second conversational technology that Ask said needs to evolve in order to make way for AI virtual assistants that can “do everything” is image recognition, which she said will dramatically improve conversational abilities.

“When you think about having a conversation with a friend or even with a brand, we’re not just talking,” Ask said. “A lot of times we’re pointing and saying ‘What about that?,’ ‘Let’s go there for dinner,’ ‘Let’s make that.’ Conversations aren’t just about text; there’s a much richer experience that goes with that.”

Plus, it is exhausting to try to describe everything with just words, Ask added. Image recognition needs to provide more context; it must go beyond just identifying objects in photos, and be able to identify what people are doing and the emotions behind a look, a task or a stance.

Ask said the accuracy and the breadth of this technology is indeed growing, but, once again, maturation is a matter of data. Massive amounts are needed to feed these AI conversational technologies. Whereas it might take a human two visual examples to process different cursive sentences, for example, it might take a machine tens of thousands or hundreds of thousands of examples to be able to comprehensively read handwritten words and sentences.

“It’s going to take a lot of time for all of these moving pieces to come together and really deliver these experiences that we consider to be magical,” Ask said.

Parenting an AI virtual assistant is harder than you’d think.