mp3 recognition using Sphinx 4

问题

Can we use mp3 files for the voice recognition process without using wav files? or can we generate a wav file from a mp3 and then do the voice recognition without a serious impact on the accuracy? The problem is I need to minimize the load transferred through the network in my application. Will the information which is lost in the conversion be a huge factor for accuracy?

回答1:

Can we use mp3 files for the voice recognition process without using wav files?

Not directly. To be able to recognize mp3 streams, you need to use java library to read mp3 and convert to pcm stream (tritonus-mp3, lameonj). You can also invoke ffmpeg as a separate process to decode.

or can we generate a wav file from a mp3 and then do the voice recognition without a serious impact on the accuracy?

Accuracy is affected in both cases, no matter where you decode mp3 file.

The problem is I need to minimize the load transferred through the network in my application. Will the information which is lost in the conversion be a huge factor for accuracy?

It's better to use losseless codec like flac for transfer. mp3 conversion degrades ASR accuracy. Another approach would be to calculate features on the client and transfer them to the server.

来源：https://stackoverflow.com/questions/9047475/mp3-recognition-using-sphinx-4

标签

mp3

speech-recognition

cmusphinx

sphinx4

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!