I feel like this is a fairly common problem but I haven\'t yet found a suitable answer. I have many audio files of human speech that I would like to break on words, which ca
You could look at Audiolab It provides a decent API to convert the voice samples into numpy arrays.
The Audiolab module uses the libsndfile C++ library to do the heavy lifting.
You can then parse the arrays to find the lower values to find the pauses.