The situation
I am using VAD (Voice Activity Detection) from WebRTC by using WebRTC-VAD, a Python adapter. The example implementation from the GitHub re
It seems that WebRTC-VAD, and the Python wrapper, py-webrtcvad, expects the audio data to be 16bit PCM little-endian - as is the most common storage format in WAV files.
librosa
and its underlying I/O library pysoundfile
however always returns floating point arrays in the range [-1.0, 1.0]
. To convertt this to bytes containing 16bit PCM you can use the following float_to_pcm16
function.
def float_to_pcm16(audio):
import numpy
ints = (audio * 32767).astype(numpy.int16)
little_endian = ints.astype('