It has been many months that all the news in the world of intelligent technologies is about from the use of deep learning algorithms. With the increased computational power deep learning has become the technology that helps evolving more powerful image recognition, machine translation or natural language understanding.

The most recent hype is about deep learning used for improving speech recognition. Google has reportedly built the listening engine empowered by deep learning that would be used for better speech recognition[i] and enhancing services like voice search or voice commands when operating the smartphone, as well as Google Now and voice mail transcription for Google Voice.

According to the blog post of Francoise Beaufays, Google Research Scientist, “Deep Neural Networks revolutionized the field of speech recognition”[ii]. “Things really improved rapidly with Recurrent Neural Networks (RNNs)”[iii] that “have additional recurrent connections and memory cells that allow them to “remember” the data they’ve seen so far—much as you interpret the words you hear based on previous words in a sentence”[iv].

Francoise Beaufays insists on the importance of having adapted and consistent language models based on the text corpora that would actually be helpful for speech recognition. It is true, as in the example in the blog post, that having corpora of classic literature will not be useful for the modern speech recognition, as written and spoken language registers are different but the language also evolves through years.

The examples of improvement of the speech recognition when using the voice mails transcription and, especially the cases of trained punctuation use are stunning!

And in the more global context, it is not surprising that Google is working on improved speech recognition. As the breaking news says that Nuance, a big player in the field, is coming to the market with the new version of its flagship product – Dragon. This time cloud based solution available on both iOS and Android mobile devices[v]. The solution targets professionals – lawyers, technicians, insurance workers, etc. and “in addition to dictation and accurate voice recognition, the app promises continual learning of a person’s voice for improved accuracy and support for industry-specific custom words”[vi].

To go further in the world of artificial intelligence and deep learning, the big news this summer was the introduction of Facebook personal assistant, M.

We talked last year about an amazing start-up company Wit.ai who developed offline speech recognition technology based on the open source software. Wit.ai since had been acquired by Facebook and participated greatly on creating the personal assistant software powered by artificial intelligence technologies. M is being differentiated from Google Now, Cortana and Siri by its even more personalized and “human”-like approach. “Alex LeBrun, who leads the Wit.ai team within Facebook, says that artificial intelligence not only makes M better for accomplishing generalized tasks, but also for cases with very special exceptions, like traveling with an infant or during blackout dates”[vii]. Interesting approach of Wit.ai Facebook team is to integrate the randomness element into the training programs for the personal assistant which is responsible for the human-like character of M. According to LeBrun, the head of Wit.ai team it is meant “to bring it closer to human learning. This means that it will sometimes try to find novel, more efficient ways to do a common task”[viii].

After some moment of calm, we do hear the great news in the personal assistants domain with the development of Cortana, more powerful Siri and more knowledgeable Google Now. M is a new player, but it surely stands out and brings another aspect to this software category based on the assessment of data, its analysis and making it actionable, the tool combining speech recognition and language understanding up to the point to be able to perform the tasks efficiently and smoothly.

 

Image credit: pixabay.com


 

[i] Google Turns to Deep Learning to Fix Speech Recognition by Tom Dawson for Android Headlines, August 12, 2015, online http://www.androidheadlines.com/2015/08/google-turns-to-deep-learning-to-fix-speech-recognition.html , accessed on August 23, 2015

[ii] The neural networks behind Google Voice transcription by Francoise Beafays, Research Scientist for Google Research Blog, August 11, 2015, online http://googleresearch.blogspot.co.uk/2015/08/the-neural-networks-behind-google-voice.html, accessed on August 24, 2015

[iii] The neural networks behind Google Voice transcription by Francoise Beafays, Research Scientist for Google Research Blog, August 11, 2015, online http://googleresearch.blogspot.co.uk/2015/08/the-neural-networks-behind-google-voice.html, accessed on August 24, 2015

[iv] The neural networks behind Google Voice transcription by Francoise Beafays, Research Scientist for Google Research Blog, August 11, 2015, online http://googleresearch.blogspot.co.uk/2015/08/the-neural-networks-behind-google-voice.html, accessed on August 24, 2015

[v] Nuance brings professional-grade speech recognition to mobile by Patrick Seitz for Investors.com, August 18, 2015, online http://news.investors.com/081815-767088-nuance-communications-nuan-revamps-dragon-product-lineup.htm?ven=yahoocp,yahoo&src=aurlled, accessed on August 24, 2015

[vi] Dragon Anywhere mobile speech recognition app planned for iOS and Android by Jackie Dove for The Next Web News, August 19, 2015, online http://thenextweb.com/apps/2015/08/19/dragon-anywhere-mobile-speech-recognition-app-planned-for-ios-and-android/, accessed on August 24, 2015

[vii] Inside Facebook's Artificial Intelligence Lab by Dave Gershgorn for Popular Science, 2015, online http://www.popsci.com/facebook-ai, accessed on September 22, 2015

[viii] Inside Facebook's Artificial Intelligence Lab by Dave Gershgorn for Popular Science, 2015, online http://www.popsci.com/facebook-ai, accessed on September 22, 2015