finding speed and tone of speech in an audio using python

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-11 21:19:34

问题


Given an audio , I want to calculate the pace of the speech. i.e how fast or slow is it.

Currently I am doing the following:
- convert speech to text and obtaining a transcript (using a free tool).
- count number of words in transcript.
- calculate length or duration of file.
- finally, pace = (number of words in transcript / duration of file).

However the accuracy of the pace obtained is dependent purely on transcription , which I think is an unnecessary step.

Is there any python-library/sox/ffmpeg way that will enable me to

  • to calculate, in a straightforward way,the speed/pace of talk in an audio
  • dominant Pitches/tones of that audio?

I referred : I referred : http://sox.sourceforge.net/sox.html and https://digitalcardboard.com/blog/2009/08/25/the-sox-of-silence/


回答1:


Your method sounds interesting as a quick first-order approximation, but limited by the transcript resolution. You can analyze directly the audio file.

I'm not familiar with Sox, but from their manual seems like the stat option gives "... time and frequency domain statistical information about the audio"

Sox claims to be a "Swiss Army knife of audio manipulation", and just by skimming through their docs seems like it might suit you to find the general tempo.

If you want to run pitch analysis too, then you can develop your own algorithm with python - I recently used librosa and found it very useful and well documented.



来源:https://stackoverflow.com/questions/48220514/finding-speed-and-tone-of-speech-in-an-audio-using-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!