Most people don’t speak clearly enough, or perhaps eloquently enough, to be able to interact precisely enough with voice assistants like Siri and Amazon Alexa, now available on the iPhone.
Most people mumble to some extent or use incorrect grammar when they speak, especially when they’re alone or at rest at home or something – that is, not speaking in public. So, until Siri and the other virtual robots have effectively learned to take account of that reality, the vast majority of smartphone or computer users will not be having conversations with them.
In the film Her, a human is able to talk to his computer in such a natural way that he actually falls in love with it. There’s probably no chance of that happening with Siri or any of the other voice assistants on the market now, although there are a small percentage of people who do feel compelled to say hello to Siri every day.
And a small percentage of iPhone users could be some tens of millions of people. More than 1 billion iPhones are floating around the planet right now, so even 1 per cent is 10 million. Imagine charging a small fee to 10 million people for using a premium version of your voice assistant.
Judging from a range of surveys over the past couple of years, more than half of all smartphone users do not interact with the onboard voice assistant – whether it’s Siri, Cortana, or Google Now or whatever it is that Google is calling its voice assistant.
Most people just aren’t interested, or won’t be, unless it’s not the effort that it is now. No one wants to take elocution lessons just to be able to talk to them, although perhaps there may be a market for elocution lessons because if you do learn to communicate adequately with Siri or Amazon Alexa, you could save a lot of time searching for things online at least.
And that’s the other problem: even if you could communicate productively with Siri and the others, what is it that they can do apart from do internet searches which you could do far quicker by typing?
The advantage of using voice assistants is yet to be conclusively proven to most people. But if they do keep progressing, and become more powerful and become more capable, most people would agree that they will basically be how everyone will be using computers.
Not many journalists enjoy transcribing an interview. As a test to see if I could save time and effort, I recently used the dictation function in Google Docs to transcribe an interview, thinking it would be 90 per cent wrong, or just would not work.You never have to type anything ever again Click To Tweet
But to my surprise, the numbers were opposite: 90 per cent of what the computer typed out was exactly what I said. I had to tidy it up as I went along, but overall, I would say it saved me an hour or two, and about 2,500 words of typing.
Now, it’s just a question of getting accustomed to using the system. I may even use it to write stories like this one, although it seems really odd to even think about it.
Those are the stumbling blocks for this technology, or natural language processing, as one might call it: people want voice assistants to be flawless because they compare them to humans; and they expect them to be able to perform functions in a way that is quicker and more useful that they could in other ways – by typing, basically.
A lot of people probably think they can type faster than they can speak to a voice assistant. This is probably not the case. The average person can type around 40 words a minute, while they can speak at around 100 words a minute.
Even fast typists reach just over 100 words a minute, which might be too fast a speaking-rate for Siri and Amazon Alexa for the time being. But the question is, for how long? How long before these voice assistants cross those thresholds and are able to understand what you say no matter how fast you speak? And how long before they can iron out all the quirks of accent, grammar and other speaking styles and still perform expertly?
These are just some of the questions that the big tech companies are trying to answer, knowing that the average worker probably spends several hours a day typing emails and other documents. And whichever company is the first to answer these questions most efficiently through its technology is certainly in line to become the first trillion-dollar company.