How to increase Google's Speech Recognition accuracy for separated numbers

筅森魡賤 提交于 2019-12-11 02:23:26

问题


We give this image to our users:

enter image description here

This picture is representing separate numbers. And all of our users read it as "11-0-9-5" into their microphones.

We use Google Speech Engine, and it interprets this result:

"1109 5".

This makes it impossible for us to compare the spoken words with the expected result. And we're stuck in this phase.

Is there a way to tell Google's Speech Recognition to understand spoken numbers literally and separately, and do not join them together?


回答1:


You can try using speech context so that you constraint the GoogleSpeechEngine to stick to predefined numbers. https://cloud.google.com/speech-to-text/docs/reference/rest/v1/RecognitionConfig#SpeechContext

So if you specify 0,1,2,3,4,5,6,7,8,9,10,11 as possible phrases google should not send back 1109 as it is not in the context.

However using this method you have to list all possible values which can be tedious. Some cases won't be solved. For exemple if someone is ponouncing 11 as 1-1.



来源:https://stackoverflow.com/questions/51376672/how-to-increase-googles-speech-recognition-accuracy-for-separated-numbers

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!