Artificial intelligence researcher and chief scientist at Baidu, Andrew Ng, sees speech recognition making a much larger impact in the future of mobile and IoT devices than we imagine today.
Speech recognition has come a long way over the last couple of decades and is starting to take off, but many people still predominantly interact with devices through typing on a keyboard or touchscreen, said Ng. That’s because speech recognition’s accuracy to transcribe words and understand the request is far from making it more convenient than typing.
“Speech recognition, depending on the circumstances, is say 95 per cent accurate. So maybe it gets one word in 20 wrong. That’s really annoying if it gets one in 20 wrong and you probably don’t want to use it very often. That’s probably where speech recognition is today,” Ng said at the CeBIT event in Sydney today.
But Ng’s focus on speech recognition at Baidu might just do away with the idea of typing out instructions for devices to carry out functions. The future is all about talking to machines, he said.
“I think that as speech recognition accuracy goes from say 95 per cent to 98, 99 to 99.9, all of us in the room will go from barely using it today or infrequently to using it all the time.
“Most people underestimate the difference between 95 and 99 per cent accuracy – 99 per cent is a game changer.”
He said being able to hone in on the user’s voice while not being thrown off by background noise is still a major challenge in speech recognition technology, which if effectively overcome, would be a big step forward.
“You are driving home and you need to send a text message. In a noisy car today, I wouldn’t even try to use speech recognition.”
When it comes to the Internet of Things, speech recognition will have a major role in providing seamless interaction, Ng said.
“I hope that some day I’ll have grandchildren who come to me and say, ‘Hey grandpa, do remember back when you were young where you said something to your microwave or oven and it would just sit there and ignore you? That just seems rude.’”
Ng said deep learning is needed to improve the capability of speech recognition. Deep learning is a neural network – which loosely simulates the way the brains transmits data – that has many hidden layers between input and output data.
“Deep learning algorithms, also called neural networks, just keep on getting better as you give it more data. And so in the regime of big data, they are out performing the older generation of algorithms.”
However, there is a way to go in building deep neural nets that are super efficient and effective to even meet a fraction of what the human brain is capable of, Ng said.
“For those of us on the frontline writing and shipping code, even though these learning algorithms take loose inspiration from the brain, when you dive into the details, they are really nothing like the human brain.”