What is the current best speech recognition API for ios to match few keywords? [closed]

試著忘記壹切 提交于 2019-11-28 03:43:23

If you want to track just few keywords, you should not look for speech recognition API or service. This task is called Keyword Spotting and it uses different algorithms than speech recognition. Speech recognition tries to find all the words that has been said and because of that it consumes way more resources than keyword spotting. Keyword spotter only tries to find few selected keywords or keyphrases. It's way simple and way less resource consuming.

The only possible solution to archive this funcitonality is to use open source package like OpenEars powered by Pocketsphinx

http://www.politepix.com/openears

Openears has Rejecto plugin that implements something similar.

Pocketsphinx itself has recently implemented open source effective keyword spotting too, but it didn't get into Openers yet. It's only available through pocketsphinx API, you need to create kws search and set the target word to look for. I hope soon this functionality will reach OpenEars too.

Nuance gives developers free access (but not for high volume) - See http://www.masshightech.com/stories/2011/09/26/daily13-Nuance-tweaks-mobile-dev-program-with-free-access-to-Dragon.html or http://dragonmobile.nuancemobiledeveloper.com/public/index.php?task=home

Nuance services are typically offered commercially and require up front fees and transaction fees. The interesting news above is that they now make low volume use of their services available to developers for free. So, for development, testing, and demonstration you can probably use the free Nuance services. However, unlike the Google services that come free in Android, if your app has thousands of users you will likely have to pay for Nuance services.

We have been developing CeedVocal SDK since 2008, it's based on Julius & FLite open source projects.

Here's some context: we wanted to make our app (Vocalia) for speech recognition back in 2008 and basically picked Julius (hesitated with Pocket Sphinx, which appears to be good as well) and optimized its file format so that it would boot in 1-2 sec instead of 20sec on the original iPhone. Then we dutifully trained our own acoustic models in 6 languages. We designed the API, and eventually decided to offer it to other developers as an SDK.

CeedVocal basically supports 2 modes of operation:

  1. matching of words (or small phrases)
  2. keyword spotting

In the first mode of operation, it tries to align the input speech to a word (or phrase) in its list of acceptable input. This forces the input to a pre-known word, even if the speech is something else. Accuracy is good. In the second mode of operation, it will try to pick one of its keywords into the stream of speech. This is a difficult case, and it can be less accurate.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!