I have to downsample a wav file from 44100Hz to 16000Hz without using any external Python libraries, so preferably wave
and/or audioop
. I tried jus
I tried using Librosa but for some reasons even after giving the line y, s = librosa.load('test.wav', sr=16000)
and librosa.output.write_wav(filename, y, sr)
, the sound files are not getting saved with the given sample rate(16000, downsampled from 44kHz).
But pydub
works well. An awesome library by jiaaro, I used the following commands:
from pydub import AudioSegment as am
sound = am.from_file(filepath, format='wav', frame_rate=22050)
sound = sound.set_frame_rate(16000)
sound.export(filepath, format='wav')
The above code states that the file that I reading with a frame_rate of 22050 is changed to rate of 16000 and export
function overwrites the existing files with this file with a new frame_rate. It works better than librosa but I am looking ways to compare the speed between two packages but haven't yet figured it out since I have very less data !!!
Refernce: https://github.com/jiaaro/pydub/issues/232
You can use Librosa's load() function,
import librosa
y, s = librosa.load('test.wav', sr=8000) # Downsample 44.1kHz to 8kHz
The extra effort to install Librosa is probably worth the peace of mind.
Pro-tip: when installing Librosa on Anaconda, you need to install ffmpeg as well, so
pip install librosa
conda install -c conda-forge ffmpeg
This saves you the NoBackendError() error.
To downsample (also called decimate) your signal (it means to reduce the sampling rate), or upsample (increase the sampling rate) you need to interpolate between your data.
The idea is that you need to somehow draw a curve between your points, and then take values from this curve at the new sampling rate. This is because you want to know the value of the sound wave at some time that wasn't sampled, so you have to guess this value by one way or an other. The only case where subsampling would be easy is when you divide the sampling rate by an integer $k$. In this case, you just have to take buckets of $k$ samples and keep only the first one. But this won't answer your question. See the picture below where you have a curve sampled at two different scales.
You could do it by hand if you understand the principle, but I strongly recommend you to use a library. The reason is that interpolating the right way isn't easy or either obvious.
You could use a linear interpolation (connect points with a line) or a binomial interpolation (connect three points with a piece of polynom) or (sometimes the best for sound) use a Fourier transform and interpolate in the space of frequency. Since fourier transform isn't something you want to re-write by hand, if you want a good subsampling/supsampling, See the following picture for two curves of upsampling using a different algorithm from scipy. The "resampling" function use fourier transform.
I was indeed in the case I was loading a 44100Hz wave file and required a 48000Hz sampled data, so I wrote the few following lines to load my data:
# Imports
from scipy.io import wavfile
import scipy.signal as sps
# Your new sampling rate
new_rate = 48000
# Read file
sampling_rate, data = wavfile.read(path)
# Resample data
number_of_samples = round(len(data) * float(new_rate) / sampling_rate)
data = sps.resample(data, number_of_samples)
Notice you can also use the method decimate in the case you are only doing downsampling and want something faster than fourier.
Thank you all for your answers. I found a solution already and it works very nice. Here is the whole function.
def downsampleWav(src, dst, inrate=44100, outrate=16000, inchannels=2, outchannels=1):
if not os.path.exists(src):
print 'Source not found!'
return False
if not os.path.exists(os.path.dirname(dst)):
os.makedirs(os.path.dirname(dst))
try:
s_read = wave.open(src, 'r')
s_write = wave.open(dst, 'w')
except:
print 'Failed to open files!'
return False
n_frames = s_read.getnframes()
data = s_read.readframes(n_frames)
try:
converted = audioop.ratecv(data, 2, inchannels, inrate, outrate, None)
if outchannels == 1:
converted = audioop.tomono(converted[0], 2, 1, 0)
except:
print 'Failed to downsample wav'
return False
try:
s_write.setparams((outchannels, 2, outrate, 0, 'NONE', 'Uncompressed'))
s_write.writeframes(converted)
except:
print 'Failed to write wav'
return False
try:
s_read.close()
s_write.close()
except:
print 'Failed to close wav files'
return False
return True
You can use resample in scipy
. It's a bit of a headache to do, because there's some type conversion to be done between the bytestring
native to python and the arrays needed in scipy
. There's another headache, because in the wave module in Python, there is no way to tell if the data is signed or not (only if it's 8 or 16 bits). It might (should) work for both, but I haven't tested it.
Here's a small program which converts (unsigned) 8 and 16 bits mono from 44.1 to 16. If you have stereo, or use other formats, it shouldn't be that difficult to adapt. Edit the input/output names at the start of the code. Never got around to use the command line arguments.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# downsample.py
#
# Copyright 2015 John Coppens <john@jcoppens.com>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
# MA 02110-1301, USA.
#
#
inwave = "sine_44k.wav"
outwave = "sine_16k.wav"
import wave
import numpy as np
import scipy.signal as sps
class DownSample():
def __init__(self):
self.in_rate = 44100.0
self.out_rate = 16000.0
def open_file(self, fname):
try:
self.in_wav = wave.open(fname)
except:
print("Cannot open wav file (%s)" % fname)
return False
if self.in_wav.getframerate() != self.in_rate:
print("Frame rate is not %d (it's %d)" % \
(self.in_rate, self.in_wav.getframerate()))
return False
self.in_nframes = self.in_wav.getnframes()
print("Frames: %d" % self.in_wav.getnframes())
if self.in_wav.getsampwidth() == 1:
self.nptype = np.uint8
elif self.in_wav.getsampwidth() == 2:
self.nptype = np.uint16
return True
def resample(self, fname):
self.out_wav = wave.open(fname, "w")
self.out_wav.setframerate(self.out_rate)
self.out_wav.setnchannels(self.in_wav.getnchannels())
self.out_wav.setsampwidth (self.in_wav.getsampwidth())
self.out_wav.setnframes(1)
print("Nr output channels: %d" % self.out_wav.getnchannels())
audio = self.in_wav.readframes(self.in_nframes)
nroutsamples = round(len(audio) * self.out_rate/self.in_rate)
print("Nr output samples: %d" % nroutsamples)
audio_out = sps.resample(np.fromstring(audio, self.nptype), nroutsamples)
audio_out = audio_out.astype(self.nptype)
self.out_wav.writeframes(audio_out.copy(order='C'))
self.out_wav.close()
def main():
ds = DownSample()
if not ds.open_file(inwave): return 1
ds.resample(outwave)
return 0
if __name__ == '__main__':
main()