Applying neural network to MFCCs for variable-length speech segments
I'm currently trying to create and train a neural network to perform simple speech classification using MFCCs. At the moment, I'm using 26 coefficients for each sample, and a total of 5 different classes - these are five different words with varying numbers of syllables. While each sample is 2 seconds long, I am unsure how to handle cases where the user can pronounce words either very slowly or very quickly. E.g., the word 'television' spoken within 1 second yields different coefficients than the word spoken within two seconds. Any advice on how I can solve this problem would be much