How “ok google” technology is implemented [closed]

≡放荡痞女 提交于 2019-12-01 14:45:14

Keyword spotting is usually implemented with dynamic programming, you just search for the best chunk of audio containing the keyword looking on all possible starts and all possible ends. You need to look for both keywords and alternatives. Basically in every moment of time you look for both keyword and other sounds and once probability for keyword is higher than the probability of other speech you raise the signal. The false alarm rate is controlled by a threshold. You do not need to handle silence specifically because it is covered by "other speech" model. In detail the algorithm is covered in the following thesis:

http://eprints.qut.edu.au/37254/

For implementation of keyword spotting you can check pocketsphinx and pocketsphinx Android demo. It is a C library able to spot words in continuous stream. You can find the tutorial here:

http://cmusphinx.sourceforge.net/wiki/tutorialpocketsphinx.

To spot for keyword from microphone you can try something simple like

  pocketsphinx_continuous -inmic yes -keyphrase "ok google" -kws_threshold 1e-20

Original "Ok Google" technology is described in the following publication:

SMALL-FOOTPRINT KEYWORD SPOTTING USING DEEP NEURAL NETWORKS by Guoguo Chen Carolina Parada Georg Heigold

https://wiki.inf.ed.ac.uk/twiki/pub/CSTR/ListenSemester2201314/chen2014small.pdf

It is pretty advanced technology, and more importantly, it requires a lot of specific data for training.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!