How to split a speech to word

这一生的挚爱 提交于 2019-12-04 16:56:06

If you know what the speaker has said you can perform forced alignment to generate the word (or phoneme) time alignments. Toolkits such as CMU Sphinx, HTK and Kaldi can perform this. If don't know what the speaker has said you can just perform standard speech recognition and use the time information to obtain the word boundaries, although there may be errors in the recognition output.

akademi4eg

Having no prior information on what phrase has been pronounced this task is pretty challenging. As one of the ways you can try applying VAD to the speech and split sound into words by pauses. But in case of spontaneous speech people often do no pases between some words. So there will be problems for sure.

Some VAD libraries are suggested here.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!