Voice is the new interface and every major device is starting to build in these capabilities. My car, phone, and computer all have elements of Voice integrated both via speaking and hearing. Alexa and Google Home have broken into the mainstream and they are just getting started. These devices are voice-activated assistants that answer questions, play music and perform a collection of other at-home tricks.
While these new voice devices are amazing, we’re still living in the DOS days of voice. What do I mean? In the early days of computers, we didn’t have graphical user-interfaces, we had command-line interfaces where you would see a cursor. These cursors would blink and you needed to type a command in a very specific way. Want to change folders? It’s “cd.” Want to delete a file? It’s either “rm” or “del”, and be careful that you don’t accidentally delete your entire computer. We’re in the same era of voice. You can’t just say something, you have to say it just so.
“Alexa, ask Spotify to play rock music on living room speakers.”
Who talks like that?
Human conversations have subtlety, context, back-and-forth, introjections, and situational awareness. Conversations don’t have “skills” that need to be installed. As voice technologies continue to advance they will sound and behave more and more conversational.
Siri was first, Alexa created a market, Google is expanding it.
Siri was a small startup that Apple acquired in 2010. At the time Siri was doing some basic voice work but it was largely a digital assistant. Amazon introduced the Echo in 2014 and the breakthrough was largely its improvement in response performance. Where Siri would take a few seconds to respond, the Echo would respond near-instantly. Performance is often a key software innovation and differentiator that is not talked about often enough. With the basics of assistance and performance, the next major advancement has been Google’s broad ability to interpret language.
While we’re nowhere near the Hal 9000 or Star trek fiction of voice recognition we are getting closer to it every day. Earlier this year Google announced Duplex, a new voice technology that it’s testing.
Now, talking computers are nothing new. We’ve had talking robots in Star Wars since the 1970’s and the idea that a computer could speak and carry out helpful tasks have been a sci-fi narrative for many years. What’s different about Duplex is that it’s not just a computer that we know is a computer, it’s a computer pretending to be human.
The advancement of Duplex shows how much closer we are to some of the natural speech problems of introjections, interpretation and open-ended conversations.
At Rightpoint we’re working on open-ended speech problems and how they can apply to real product innovations. We believe that the digital kiosks and mobile apps of the future will be much closer to natural conversations that we’ve been having for years.
In the 1940’s-50’s Alan Turing devised a test to see if a computer could pass as a human, this was later called the Turing test. The test sparked a race to try to build software that could pass for intelligence. While there’s been a ton of progress in computer science toward the Turing test, the next chapter seems to be the Voice Turing test and by that measure, we’re getting much closer.
Interested in having voice capabilities added to your product experience? Rightpoint works with Alexa, Google and Cortana technologies to build tomorrow’s voice apps today. Contact us to learn more.