I have always had a feeling that open source technologies allow to create and capture more value than proprietary locked-in systems. This belief had been so strong that I even wrote my thesis about it.

At the same time, when looking at the tech landscape in the world of linguistic technologies and speech recognition, very few companies and technologies were going open-source way. Big players like Apple and Google acquire or develop in-house protected technologies; others go with API solutions proposed by Nuance, Google or AT&T. One of the reasons is, surely, that the technologies are strongly protected by patent rights. In this case, the value is captured by the intellectual property protection, and cannot be unlocked.

At the same time, in the world of technological research, numerous open source solutions are emerging. And what is extremely interesting, a start-up company using open source technologies for speech recognition solutions applied to internet of things, robotics mobile and wearables has come under radar of MIT Technology Review[i].

The company called Wit.ai is proposing voice recognition solutions for hardware and software manufactures to be integrated into the connected devices that are nowadays everywhere. It makes it easy to control automatic solutions with voice commands, and the innovation proposed by Wit.ai is to limit the number of commands in order to improve the quality of voice recognition.

When digging deeper on the company’s website, one can see that the team of Wit.ai enthusiasts are probably using Sphinx[ii] and Kaldi[iii] technologies for powering speech recognition engine. These are open source technologies. Sphinx has been developed by the research center of Carnegie Mellon University and Kaldi is distributed under Apache License v2.0.

While being freely available, these technologies are no less powerful than those created by Google or Nuance Communications. A natural question is then why everyone is not using these free technologies? The answer is as natural, as all open source technologies, they only provide you with the basis – the linguistic models should be created. Also, if going for a statistic model, a huge corpus of written and read text should be available for training the engine using machine learning algorithm.

This can also explain why Wit.ai has decided to go for the creative solution of limiting the voice recognition to a number of commands. This is surely the first step that allows launching production, which is explained by the company’s cofounder and CEO Alex Lebrun, according to whom “as more data is added to the system, the non-English languages will improve”[iv].

Another advantage of using open source technologies for speech recognition is the fact that they can be used cross-platforms and operation systems – crucial for mobile world divided between Android and iOS and home automation devices. And, the most interesting, Sphinx is being the only system proposing offline speech recognition for mobile platforms. The only other solution for this is Google’s Android speech recognition API limited to Android most recent versions (introduced with KitKat Android 4.4 API). And, by the way, Wit.ai intends “to build and train voice interactions and then download it so it can be used on, say, a smartphone, without needing an Internet connection”[v].

At the moment, mobile technologies – processors and memory do not allow easy use of offline speech recognition. One needs to have big linguistic model uploaded and to go through it when recognizing a phrase which takes time with the state of the art processors, but with the models trained in advance and prepared to a limited number of phrases, this will surely work soon. Semantic technologies can also enhance the system, as recognizing words in a given context should be easier. But, just imagine advances it gives to the security of information and availability of your device if voice recognition no longer needs a remote server and broadband Internet connection!

All this still requires an additional effort, but one can already conclude that using open source technologies allow to create value in the long term even if their initial use requires more savvy and enthusiast developers.

 


[i] “Voice Recognition for the Internet of Things”, MIT Technology Review, October 24, 2014, online http://www.technologyreview.com/news/531936/voice-recognition-for-the-internet-of-things/, accessed on October 25, 2014

[ii] University Carnegie Mellon, 2014. CMU Sphinx - Speech Recognition Toolkit. Open Source Toolkit For Speech Recognition. Online http://cmusphinx.sourceforge.net/, accessed March 13, 2014

[iii] 'KALDI': Kaldi, online http://kaldi.sourceforge.net/index.html, accessed on May 22, 2014

[iv] “Voice Recognition for the Internet of Things”, MIT Technology Review, October 24, 2014, online http://www.technologyreview.com/news/531936/voice-recognition-for-the-internet-of-things/, accessed on October 25, 2014

[v] “Voice Recognition for the Internet of Things”, MIT Technology Review, October 24, 2014, online http://www.technologyreview.com/news/531936/voice-recognition-for-the-internet-of-things/, accessed on October 25, 2014