Can the Google Speech API be configured to return only numbers / letters?

荒凉一梦 提交于 2020-05-13 07:14:47

问题


Can the Google Speech API be configured to only return numbers and letters, as opposed to full words?

The use case is translating Canadian postal codes. Ex. M 1 B 0 R 3. Google may return "Em 1 Be 0 Are 3"

We have tried:

  • Using speechContexts and feeding in letters A - Z, as individual phrases. This improved the accuracy for us. We did not have much success passing in individual numbers (ex 1, 2, 3).
  • Specifying the codec and sample rate of our WAV file using the encoding and sampleRateHertz configuration options. We saw no improvement in doing this as we believe Google already does a great job of auto-recognizing the the sample rate and encoding.

Our audio file is 8000hz and encoded with "M-ULAW". We have no flexibility in changing the sample rate or encoding.

Is there a way to get a more accurate response from Google for this use case? Even ideas for better speechContexts phrases are welcome.

Thank you


回答1:


We are experiencing the same results, we would love to have a syntax based "context" suggestion or a parameter to force only digit return variable.

Changes in api version isn't fixing the way the digits are recognised, not even using model: phone_call.

What actually was better for recognising some kind of numbers, was to switch to en_US locale and that in turn forced the recognition engine to identify a list of numbers as a phone. So it was returned in phone-like syntax with +XXX-XXX-XXX-XXXX and this made detection really really good.

So I don't understand why Google has syntax matching behind the curtains and doesn't make it available through their api.



来源:https://stackoverflow.com/questions/45310657/can-the-google-speech-api-be-configured-to-return-only-numbers-letters

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!