mfcc

ValueError: could not broadcast input array from shape (20,590) into shape (20)

一个人想着一个人 提交于 2021-02-10 06:37:07
问题 I am trying to extract features from .wav files by using MFCC's of the sound files. I am getting an error when I try to convert my list of MFCC's to a numpy array. I am quite sure that this error is occurring because the list contains MFCC values with different shapes (But am unsure of how to solve the issue). I have looked at 2 other stackoverflow posts, however these don't solve my problem because they are too specific to a certain task. ValueError: could not broadcast input array from

What are the components of the Mel mfcc

百般思念 提交于 2021-01-29 17:08:24
问题 In looking at the output of this line of code: mfccs = librosa.feature.mfcc(y=librosa_audio, sr=librosa_sample_rate, n_mfcc=40) print("MFCC Shape = ", mfccs.shape) I get a response of MFCC Shape = (40,1876) . What do these two numbers represent? I looked at the librosa website but still could not decipher what are these two values. Any insights will be greatly appreciated! 回答1: The first dimension (40) is the number of MFCC coefficients , and the second dimensions (1876) is the number of time

What is the second number in the MFCCs array?

二次信任 提交于 2020-11-29 10:18:04
问题 When I extract MFCCs from an audio the ouput is (13, 22) . What does the number represent? Is it time frames ? I use librosa. The code is use is: mfccs = librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=13, hop_length=256) mfccs print(mfccs.shape) And the ouput is (13,22) . 回答1: Yes, it is time frames and mainly depends on how many samples you provide via y and what hop_length you choose. Example Say you have 10s of audio sampled at 44.1 kHz (CD quality). When you load it with librosa, it

音乐分类/生成杂记

那年仲夏 提交于 2020-03-19 09:38:04
任何的自动语音识别系统中,第一步一般都是提取特征,也就是把音频信号中具有辨识性的成分提取出来,舍弃掉其他不相关的信息,比如背景噪音等等。而语音的特征提取本质上是降低信号的冗余度,用较少的数据表现语音的特征。 这里,这段语音被分为很多帧, 每帧语音都对应于一个频谱(通过短时FFT计算) ,频谱表示频率与能量的关系。在实际使用中,频谱图有三种,即线性振幅谱、对数振幅谱、自功率谱(对数振幅谱中各谱线的振幅都作了对数计算,所以其纵坐标的单位是dB(分贝)。这个变换的目的是使那些振幅较低的成分相对高振幅成分得以拉高,以便观察掩盖在低幅噪声中的周期信号)。 我们先将其中一帧语音的频谱通过坐标表示出来,如上图左。现在我们将左边的频谱旋转90度。得到中间的图。然后把这些幅度映射到一个灰度级表示(也可以理解为将连续的幅度量化为K(16、32等)个量化值),0表示黑,255表示白色。幅度值越大,相应的区域越黑。这样就得到了最右边的图。把每一帧的声谱按照时间顺序平行拼接起来,会得到一个随着时间变化的频谱图,这个就是描述语音信号的spectrogram声谱图。 2.1.2 梅尔频率倒谱系数(MFCC)提取 为实现歌曲的音乐基因标注,首先需要将音频转化成计算机能够识别的格式,同时要求对音频不失真, 业界最常用到的语音特征 就是梅尔频率倒谱系数(Mel-scale Frequency Cepstral

Atitit 语音识别的技术原理

醉酒当歌 提交于 2020-02-15 22:57:23
Atitit 语音识别的技术原理 1.1. 语音识别技术,也被称为 自动语音识别 Automatic Speech Recognition , (ASR) , 2 1.2. 模型 目前,主流的大词汇量语音识别系统多采用统计 模式识别技术 2 1.3. 基本方法 般来说 , 语音识别的方法有三种:基于声道模型和语音知识的方法、模板匹配的方法以及利用人工神经网络的方法。 2 1.3.1. 模板匹配的方法 2 1.4. 一般来说 , 语音识别的方法有三种:基于声道模型和语音知识的方法、模板匹配的方法以及利用人工神经网络的方法。 2 1.5. 提及 语音识别 ,就不能不说 Nuance, Nuance的语音技术是以统计推断方法为基础,着眼于音素(音节的声音)和语境来识别话语 2 1.6. , 神经网络 。 这种技术可使得精确度提升 25%以上,这是一个巨大的飞跃,因为这个行业只需要提升5%就具备革命意义 3 1.7. 语音信号预处理与特征提取 3 1.7.1. 基于语音学和声学的方法 3 1.8. PCM 文件,也就是俗称的 wav 文件。 4 1.9. VAD 静音切除 4 1.10. 要对声音进行分析,需要对声音分帧,也就是把声音切开成一小段一小段,每小段称为一帧。 4 1.11. 一个完整的基于统计的语音识别系统可大致分为三部分: 5 1.12. MFCC 特征 特征主要用

钢琴识谱练习 VB.NET PYTHON

送分小仙女□ 提交于 2020-01-19 16:38:24
Imports NAudio.Wave Imports MathNet.Numerics.IntegralTransforms Imports System.Numerics Imports TensorFlow Imports System.IO Public Class Form1 '录音 Dim wav As New WaveInEvent Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click '设置缓冲区大小 wav.BufferMilliseconds = 128 '缓冲区大小= 频率*Milliseconds*字节/1000 ; wav.NumberOfBuffers = 6 '原12 减少缓冲区数量,使用录音不中断 wav.WaveFormat = New WaveFormat(16000, 16, 1) '格式 16000 '添加回调函数 AddHandler wav.DataAvailable, AddressOf waveIn_DataAvailable wav.StartRecording() End Sub '回调函数 Dim WavData16(2048 - 1) As Int16 Dim WavDataDb(2048 - 1) As Single

Creating wave data from FFT data?

萝らか妹 提交于 2020-01-16 19:39:58
问题 As you might notice, i am really new to python and sound processing. I (hopefully) extracted FFT data from a wave file using python and the logfbank and mfcc function. (The logfbank seems to give the most promising data, mfcc output looked a bit weird for me). In my program i want to change the logfbank/mfcc data and then create wave data from it (and write them into a file). I didn't really find any information about the process of creating wave data from FFT data. Does anyone of you have an