Google is taking advantage of its cloud infrastructure and the huge volume of typed search queries to refine its Voice Search function, part of a massive research effort in voice that spans both mobile devices and the web. Voice Search, introduced about 18 months ago, lets mobile users search the web by speaking into their phones rather than typing in a query. It's available on the iPhone, BlackBerry, Nokia Series 60 devices and some Android phones.
Accuracy is a major factor for success, driving useful results that cause users to return to the service, said Michael Cohen, manager of speech technology at Google, in a speech at the Mobile Voice Conference. The company strives to make Voice Search a "frictionless" experience for the user, with correct results obtained easily. Making speech recognition more accurate has been a decadeslong effort, and Google is applying its massive scale to the problem, Cohen said.
Voice Search is based on "language models," which are statistical models of what sequences of words are most likely to occur. For example, a good language model would know that it's more likely a speaker would say "the dog barked" than "the dog talked."
Google is constantly "training" new language models for its speech recognition engine, Cohen said. In doing so, it taps into the search terms that users type into Google.com. From 230 billion words typed in search requests at Google.com, researchers have compiled the 1 million most frequently used unique words to form a vocabulary with which to train the voice system. Both numbers are arbitrary, and 230 billion does not represent the total number of words entered at Google in any given period, Cohen said. AskOxford.com, from the publisher of the Oxford English Dictionary, estimates that there are at least 250,000 words in the English language. Cohen said the 1 million unique words include plurals and other versions of words.
It takes 70 "CPU years", the amount of work one CPU can perform in a year, to process those 230 billion words from Google.com and train a new language model, Cohen said. Google trains new language models constantly as part of its research.
"There are huge computational demands as we're taking on lots and lots of data (and) bigger and bigger models," Cohen said. "Luckily, we have a lot of compute power we can apply to that. And there are demands on infrastructure, and luckily, Google has a very well designed software infrastructure, so we can do things like quickly parallelise something," running it on thousands of computers at the same time, he said.
A cloud infrastructure offers other advantages in speech recognition, he said. For one thing, Google can rapidly test and refine its speech recognition software, sending out new versions, while consumers are using it in the field. In addition, as consumers use Voice Search, Google learns from real world experiences.
In addition to making speech recognition easier to use, Google wants to make it ubiquitously available. A big step in that direction was a feature included in the Nexus One handset that gives the user the option of speaking instead of typing every time the keyboard pops up on the phone's screen, Cohen said.
Speech recognition is also a big part of Google Voice, powering its voicemail transcription feature. But Google's interest in voice goes beyond mobile phones, Cohen said. Voice is the biggest group in Google Research, and findings in this area can be useful in many areas, he said. The company wants to be able to understand and deliver spoken content on the Web as well as the written information it finds now through its search engine. One recent move was the addition of a closed caption option for YouTube videos. Using that capability, Google is also beginning to offer foreign-language subtitles through text-to-text translation of those captions.
Cohen was a co-founder of Nuance Communications and has been working on speech recognition for 25 years. In that time, "It's come a long way, but it has a long way to go," he said.
Microsoft is also developing voice search capabilities for its Bing search engine.