librosa | 易学教程

How to resample a .wav sound file which is being read using the wavfile.read?

阅读更多关于 How to resample a .wav sound file which is being read using the wavfile.read?

问题 I want to change the following two lines of my code: clip, sample_rate = librosa.load(file_name) clip = librosa.resample(clip, sample_rate, 2000) I want to load the .wav file using wavfile.read() instead of using librosa.load() and then resample it using some technique other than the libroa.resample() . Any idea how to do it? 回答1: So here is the answer folks! The below solution worked for me. from scipy.io import wavfile import scipy.signal as sps from io import BytesIO new_rate = 2000 # Read

How do you determing the correct dimension of Mel Spectrogram Feature Extraction for NN

阅读更多关于 How do you determing the correct dimension of Mel Spectrogram Feature Extraction for NN

问题 I trying to implement a Mel Spectrogram feature extraction: n_mels = 128 # Extracting MelFrequency Spectrum for every file def extract_features(file_name): try: audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast') mely = librosa.feature.melspectrogram(y=audio, sr=sample_rate, n_mels=n_mels) except Exception as e: print("Error encountered while parsing file: ", file) return None return mely.T It appears that I am implementing this feature extraction incorrectly as when I check

How do you determing the correct dimension of Mel Spectrogram Feature Extraction for NN

阅读更多关于 How do you determing the correct dimension of Mel Spectrogram Feature Extraction for NN

librosa installation via pip failing

阅读更多关于 librosa installation via pip failing

问题 Python version is 3.4.2 (env) ishandutta2007@MacBook-Pro:~/Documents/Projects/my_proj$ pip install librosa Collecting librosa Collecting joblib>=0.12 (from librosa) Using cached https://files.pythonhosted.org/packages/69/91/d217cec1fe6eac525ca964cd67e4f79b1d4ce68b64cb82d0b9ae1af2311e/joblib-0.12.5-py2.py3-none-any.whl Collecting numba>=0.38.0 (from librosa) Collecting scikit-learn!=0.19.0,>=0.14.0 (from librosa) Using cached https://files.pythonhosted.org/packages/9b/bc

How to convert a mel spectrogram to log-scaled mel spectrogram

阅读更多关于 How to convert a mel spectrogram to log-scaled mel spectrogram

问题 I was reading this paper on environmental noise discrimination using Convolution Neural Networks and wanted to reproduce their results. They convert WAV files into log-scaled mel spectrograms. How do you do this? I am able to convert a WAV file to a mel spectrogram y, sr = librosa.load('audio/100263-2-0-117.wav',duration=3) ps = librosa.feature.melspectrogram(y=y, sr=sr) librosa.display.specshow(ps, y_axis='mel', x_axis='time') I am also able to display it as a log scaled spectrogram: librosa

How to convert a mel spectrogram to log-scaled mel spectrogram

阅读更多关于 How to convert a mel spectrogram to log-scaled mel spectrogram

MPEG Audio Constant bit rate conversion

阅读更多关于 MPEG Audio Constant bit rate conversion

问题 I am trying to convert few .wav files to .mp3 format The desired .mp3 format is : I tried with FFmpeg with this code : ffmpeg -i input.wav -vn -ac 2 -b:a 160k output1.mp3 This is the output of this command on one .wav format I am getting the result but two things are different Overall bit rate mode and Writing library Writing library: LAME3.99.5 vs LAME3.100 ( I think this shouldn't make any problem?) bit rate mode Constant Vs variable How to change bit rate mode from variable to Constant?

MPEG Audio Constant bit rate conversion

阅读更多关于 MPEG Audio Constant bit rate conversion

What are the components of the Mel mfcc

阅读更多关于 What are the components of the Mel mfcc

问题 In looking at the output of this line of code: mfccs = librosa.feature.mfcc(y=librosa_audio, sr=librosa_sample_rate, n_mfcc=40) print("MFCC Shape = ", mfccs.shape) I get a response of MFCC Shape = (40,1876) . What do these two numbers represent? I looked at the librosa website but still could not decipher what are these two values. Any insights will be greatly appreciated! 回答1: The first dimension (40) is the number of MFCC coefficients , and the second dimensions (1876) is the number of time

Getting the frequencies associated with STFT in Librosa

阅读更多关于 Getting the frequencies associated with STFT in Librosa

问题 When using librosa.stft() to calculate a spectrogram, how does one get back the associated frequency values? I am not interested in generating an image as in librosa.display.specshow , but rather I want to have those values in hand. y, sr = librosa.load('../recordings/high_pitch.m4a') stft = librosa.stft(y, n_fft=256, window=sig.windows.hamming) spec = np.abs(stft) spec gives me the 'amplitude' or 'power' of each frequency, but not the frequencies bins themselves. I have seen that there is a