mfcc

Applying neural network to MFCCs for variable-length speech segments

空扰寡人 提交于 2019-12-08 03:10:29
I'm currently trying to create and train a neural network to perform simple speech classification using MFCCs. At the moment, I'm using 26 coefficients for each sample, and a total of 5 different classes - these are five different words with varying numbers of syllables. While each sample is 2 seconds long, I am unsure how to handle cases where the user can pronounce words either very slowly or very quickly. E.g., the word 'television' spoken within 1 second yields different coefficients than the word spoken within two seconds. Any advice on how I can solve this problem would be much

Spectrograms generated using Librosa don't look consistent with Kaldi?

橙三吉。 提交于 2019-12-07 19:30:22
问题 I generated spectrogram of a "seven" utterance using the "egs/tidigits" code from Kaldi, using 23 bins, 20kHz sampling rate, 25ms window, and 10ms shift. Spectrogram appears as below visualized via MATLAB imagesc function: I am experimenting with using Librosa as an alternative to Kaldi. I set up my code as below using the same number of bins, sampling rate, and window length / shift as above. time_series, sample_rate = librosa.core.load("7a.wav",sr=20000) spectrogram = librosa.feature

Mel Frequency Cepstral Coefficients (MFCC) in C/C++

霸气de小男生 提交于 2019-12-07 03:13:16
问题 Is there any implementation of MFCC available in C/C++? Any source codes or libraries? I've already found http://code.google.com/p/libmfcc/ which seem to be good. 回答1: A recap in 2016: libmfcc is simple, MIT license, unsupported since 2010. YAAFE provides MFCCs and other features, LGPLv3, unsupported since 2011. Kaldi is overkill, but it can be used just for the MFCC. Apache License v2.0, and still supported. PocketSphinx is the CMU toolkit for speech recognition, CMU license (BSD-style), and

Spectrograms generated using Librosa don't look consistent with Kaldi?

点点圈 提交于 2019-12-06 12:59:27
I generated spectrogram of a "seven" utterance using the "egs/tidigits" code from Kaldi, using 23 bins, 20kHz sampling rate, 25ms window, and 10ms shift. Spectrogram appears as below visualized via MATLAB imagesc function: I am experimenting with using Librosa as an alternative to Kaldi. I set up my code as below using the same number of bins, sampling rate, and window length / shift as above. time_series, sample_rate = librosa.core.load("7a.wav",sr=20000) spectrogram = librosa.feature.melspectrogram(time_series, sr=20000, n_mels=23, n_fft=500, hop_length=200) log_S = librosa.core.logamplitude

Meaning of MFCC

限于喜欢 提交于 2019-12-06 11:23:02
问题 I have a conceptual problem. I know what is a mel scale and what it represent and I know that this kind of spectrogram still has too much information for what I need. I think that if we want reduce the number of information of the spectrogram we use the MFCC. But I really don't get what the MFCC is and what it represent? I use a MFCC matrix in a speech recognition process, but I don't understand what all of the number inside that vector represent. The array is 13x130 and I don't know what all

mfcc代码+一阶、二阶差分(matlab代码)

孤街醉人 提交于 2019-12-05 22:00:27
clc; close all; clear all; [x , fs] = audioread('C:\Users\Administrator\Desktop\waves_yesno\0_0_0_1_0_0_0_1.wav'); %mel bank = melbankm(24, 256 , fs , 0 , 0.5 , 'm'); bank_1 = full(bank); bank_2 = bank_1 / max(bank_1(:)); %DCT for k = 1 : 12 n = 0 : 23; dctcoef(k , :) = cos((2 * n + 1) * k * pi / (2 * 24)); end w = 1 + 6 * sin(pi * [1 : 12] ./ 12); w2one = w /max(w); x2double = double(x); x2emph = filter([1 -0.98] , 1 , x2double); x2enfra = enframe(x2emph , 256 , 80); for i = 1 : size(x2enfra , 1) y = x2enfra(1 , :); s = y' .* hamming(256);%注意hamming1 * 1matrix % y2rota = y'; t = abs(fft(s));

Meaning of MFCC

試著忘記壹切 提交于 2019-12-04 17:28:31
I have a conceptual problem. I know what is a mel scale and what it represent and I know that this kind of spectrogram still has too much information for what I need. I think that if we want reduce the number of information of the spectrogram we use the MFCC. But I really don't get what the MFCC is and what it represent? I use a MFCC matrix in a speech recognition process, but I don't understand what all of the number inside that vector represent. The array is 13x130 and I don't know what all these float mean. I understood that more long is my audio track bigger is my matrix (e.g 13x250,

Matching two series of Mfcc coefficients

落花浮王杯 提交于 2019-12-04 16:25:40
I have extracted two series MFCC coefficients from two around 30 second audio files consisting of the same speech content. The audio files are recorded at the same location from different sources. An estimation should be made whether the audio contains the same conversation or a different conversation. Currently I have tested a correlation calculation of the two Mfcc series but the result is not very reasonable. Are there best practices for this scenario? I had the same problem and the solution for it was to match the two arrays of MFCCs using the Dynamic Time Warping algorithm. After

How to plot MFCC in Python?

烂漫一生 提交于 2019-12-03 12:57:47
问题 I'm just a beginner here in signal processing. Here is my code so far on extracting MFCC feature from an audio file (.WAV): from python_speech_features import mfcc import scipy.io.wavfile as wav (rate,sig) = wav.read("AudioFile.wav") mfcc_feat = mfcc(sig,rate) print(mfcc_feat) I just wanted to plot the mfcc features to know what it looks like. 回答1: from python_speech_features import mfcc import scipy.io.wavfile as wav import matplotlib.pyplot as plt (rate,sig) = wav.read("AudioFile.wav") mfcc

How to plot MFCC in Python?

孤街浪徒 提交于 2019-12-03 03:54:33
I'm just a beginner here in signal processing. Here is my code so far on extracting MFCC feature from an audio file (.WAV): from python_speech_features import mfcc import scipy.io.wavfile as wav (rate,sig) = wav.read("AudioFile.wav") mfcc_feat = mfcc(sig,rate) print(mfcc_feat) I just wanted to plot the mfcc features to know what it looks like. from python_speech_features import mfcc import scipy.io.wavfile as wav import matplotlib.pyplot as plt (rate,sig) = wav.read("AudioFile.wav") mfcc_feat = mfcc(sig,rate) print(mfcc_feat) plt.plot(mfcc_feat) plt.show() This will plot the MFCC as colors,