STFT understanding using librosa

╄→尐↘猪︶ㄣ 提交于 2021-01-25 07:00:51

问题


I have an audio sample of about 14 seconds in 8khz Sample Rate. Im using librosa to extract some features from this audio file.

y, sr = librosa.load(file_name)
stft = np.abs(librosa.stft(y, n_fft=n_fft))

# file_length = 14.650022675736961 #sec
# defaults 
# n_fft =2048
# hop_length = 512 # win_length/4 = n_fft/4 = 512 (win_length = n_fft default)

#windowsTime = n_fft * Ts # (1/sr)

stft.shape
# (1025, 631)

Specshow :

librosa.display.specshow(stft, x_axis='time', y_axis='log')

[![stft sr = 22050][1]][1]

Now, i can understand the shape of the STFT

631 time bins = are 4 * ( file_length / Ts * windowsTime) #overlapping
1025 frequency bins = Frames frequency gap sr/n_fft.
so there are 1025 frequencies in 0 to sr/2(Nyquest)

what i cant understand is the different plot of two different sample rates with same ratios. 1 - 22050 as librosa default 2 - 8khz as sampling rate file

y2, sr = librosa.load(file_name, sr=None)

n_fft2 =743 # (same ratio to get same visuals for comparsion)
hop_length = 186 # (1/4 n_fft by default)

stft2 = np.abs(librosa.stft(y2, n_fft=n_fft2))

so ofc the shappe of stft will be different

stft2.shape
# (372, 634)


[![stft sr = 743][2]][2]

1. but why is the absoulte frequencies are not the same? its the same signal just not being oversampled so each sample is unique. what am i missing? is it the static y axis?

2. i couldnt understand the time bins values. im expecting bins in the number of frames when the first 1 is in the hop length and the second bin is windowTime from that point until the end of the file. but the units are wierd?

i want to be able to extract the magnitude of a specific Fbin in a specific time (frame) or additionally be able to sum some of those to get the magnitue for time RANGE.

Therefore, if i take stft[number of fBin] which is 1 row of 1025 fBins (stft[1025]) and look at it contents so stft[0] contains 630 points, which are exactly 630 time points for each frequency so each of the frames 1-1025 will have the same time points.

so if i take one sample which suits all the other fbins as well ( same time points) which is stft[0] i would be able to choose time frame and fBin and get the spcific magnitude:

times =  librosa.core.frames_to_time(stft2[0], sr=sr2, n_fft=n_fft2, hop_length=hop_length) 

fft_bin = 6
time_idx = 10

print('freq (Hz)', freqs[fft_bin])
print('time (s)', times[time_idx])
print('amplitude', stft[fft_bin, time_idx])

array([0.047375, 0.047625, 0.04825 , 0.04825 , 0.046875, 0.04675 , 0.05 , 0.051625, 0.051 , 0.048 , 0.05225 , 0.050375, 0.04925 , 0.04725 , 0.051625, 0.0465 , 0.05225 , 0.05 , 0.053 , 0.053875, 0.048 , 0.0485 , 0.047875, 0.04775 , 0.0485 , 0.049 , 0.051375, 0.047125, 0.051125, 0.047125, 0.04725 , 0.05025 , 0.05425 , 0.05475 , 0.051375, 0.060375, 0.050625, 0.04875 , 0.054125, 0.048 , 0.05025 , 0.052375, 0.04975 , 0.054125, 0.055625, 0.047125, 0.0475 , 0.047 , 0.049875, 0.05025 , 0.048375, 0.047 , 0.050625, 0.05 , 0.046625, 0.04925 , 0.048 , 0.049125, 0.05375 , 0.0545 , 0.04925 , 0.049125, 0.049125, 0.049625, 0.047 , 0.047625, 0.0535 , 0.051875, 0.05075 , 0.04975 , 0.047375, 0.049 , 0.0485 , 0.050125, 0.048 , 0.05475 , 0.05175 , 0.050125, 0.04725 , 0.0575 , 0.056875, 0.047 , 0.0485 , 0.055375, 0.04975 , 0.047 , 0.0495 , 0.051375, 0.04675 , 0.04925 , 0.052125, 0.04825 , 0.048125, 0.046875, 0.047 , 0.048625, 0.050875, 0.05125 , 0.04825 , 0.052125, 0.052375, 0.05125 , 0.049875, 0.048625, 0.04825 , 0.0475 , 0.048375, 0.050875, 0.052875, 0.0475 , 0.0485 , 0.05225 , 0.053625, 0.05075 , 0.0525 , 0.047125, 0.0485 , 0.048875, 0.049 , 0.0515 , 0.055875, 0.0515 , 0.05025 , 0.05125 , 0.054625, 0.05525 , 0.047 , 0.0545 , 0.052375, 0.049875, 0.051 , 0.048625, 0.0475 , 0.048 , 0.048875, 0.050625, 0.05375 , 0.051875, 0.048125, 0.052125, 0.048125, 0.051 , 0.052625, 0.048375, 0.047625, 0.05 , 0.048125, 0.050375, 0.049125, 0.053125, 0.053875, 0.05075 , 0.052375, 0.048875, 0.05325 , 0.05825 , 0.055625, 0.0465 , 0.05475 , 0.051125, 0.048375, 0.0505 , 0.04675 , 0.0495 , 0.04725 , 0.046625, 0.049625, 0.054 , 0.056125, 0.05175 , 0.050625, 0.050375, 0.047875, 0.047 , 0.048125, 0.048875, 0.050625, 0.049875, 0.047 , 0.0505 , 0.047 , 0.053125, 0.047625, 0.05025 , 0.04825 , 0.05275 , 0.051625, 0.05 , 0.051625, 0.05425 , 0.052 , 0.04775 , 0.047 , 0.049125, 0.05375 , 0.0535 , 0.04925 , 0.05125 , 0.046375, 0.04775 , 0.04775 , 0.0465 , 0.047 , 0.04675 , 0.04675 , 0.04925 , 0.05125 , 0.046375, 0.04825 , 0.0525 , 0.057875, 0.056375, 0.054375, 0.04825 , 0.0535 , 0.05475 , 0.0485 , 0.048875, 0.048625, 0.0485 , 0.047625, 0.046875, 0.0465 , 0.05125 , 0.054 , 0.05 , 0.048 , 0.047875, 0.0515 , 0.048125, 0.055875, 0.054875, 0.051625, 0.048125, 0.047625, 0.048375, 0.052875, 0.0485 , 0.0475 , 0.0495 , 0.05025 , 0.05675 , 0.0585 , 0.051625, 0.05625 , 0.0605 , 0.052125, 0.0495 , 0.049 , 0.047875, 0.051375, 0.054125, 0.0525 , 0.0515 , 0.057875, 0.055 , 0.05375 , 0.046375, 0.04775 , 0.0485 , 0.050125, 0.050875, 0.04925 , 0.049125, 0.0465 , 0.04975 , 0.053375, 0.05225 , 0.0475 , 0.046375, 0.05375 , 0.049875, 0.049875, 0.047375, 0.049125, 0.049375, 0.04875 , 0.048125, 0.05075 , 0.0505 , 0.046375, 0.047375, 0.048625, 0.0485 , 0.047125, 0.052625, 0.051125, 0.04725 , 0.050875, 0.053875, 0.0475 , 0.0495 , 0.051 , 0.055 , 0.053 , 0.050125, 0.04675 , 0.05375 , 0.054375, 0.04725 , 0.046875, 0.04925 , 0.04725 , 0.0495 , 0.05075 , 0.050875, 0.04775 , 0.05125 , 0.050125, 0.047875, 0.04825 , 0.046625, 0.0475 , 0.046375, 0.04775 , 0.05075 , 0.048125, 0.046375, 0.049625, 0.0495 , 0.04675 , 0.046625, 0.0475 , 0.04825 , 0.053 , 0.050875, 0.049 , 0.057875, 0.058875, 0.049875, 0.049125, 0.0475 , 0.05225 , 0.055 , 0.055375, 0.053875, 0.051125, 0.049875, 0.05025 , 0.050875, 0.049 , 0.0575 , 0.051875, 0.049375, 0.04775 , 0.051125, 0.050375, 0.0465 , 0.047375, 0.0465 , 0.046375, 0.048875, 0.051875, 0.047 , 0.047125, 0.047125, 0.046875, 0.049625, 0.048625, 0.051 , 0.049 , 0.046375, 0.049 , 0.056125, 0.054625, 0.047625, 0.046625, 0.0475 , 0.051875, 0.05175 , 0.047625, 0.050375, 0.055125, 0.05275 , 0.047125, 0.05325 , 0.060125, 0.056625, 0.053 , 0.052125, 0.047125, 0.04825 , 0.050375, 0.05025 , 0.048 , 0.046625, 0.047125, 0.04875 , 0.047 , 0.05525 , 0.0535 , 0.047 , 0.0495 , 0.0535 , 0.05125 , 0.046625, 0.0495 , 0.04675 , 0.04875 , 0.047125, 0.04975 , 0.047 , 0.049875, 0.046875, 0.047125, 0.048 , 0.046375, 0.0495 , 0.04975 , 0.05125 , 0.048375, 0.049125, 0.0515 , 0.048375, 0.052375, 0.051125, 0.046375, 0.047125, 0.050375, 0.0465 , 0.052375, 0.05375 , 0.04925 , 0.05025 , 0.0565 , 0.054875, 0.048 , 0.049375, 0.052625, 0.055375, 0.053375, 0.05075 , 0.048875, 0.05475 , 0.05075 , 0.0485 , 0.049125, 0.0475 , 0.047375, 0.047375, 0.047 , 0.052125, 0.053875, 0.049 , 0.052625, 0.0485 , 0.04675 , 0.04875 , 0.05 , 0.0545 , 0.05025 , 0.0495 , 0.0515 , 0.0485 , 0.05025 , 0.0465 , 0.0465 , 0.048375, 0.06375 , 0.10175 , 0.11975 , 0.118375, 0.121375, 0.12675 , 0.123 , 0.095375, 0.055 , 0.05525 , 0.04775 , 0.053125, 0.052375, 0.056625, 0.0565 , 0.046875, 0.048 , 0.05175 , 0.048 , 0.052 , 0.048 , 0.048 , 0.05175 , 0.05025 , 0.049625, 0.049625, 0.047375, 0.046625, 0.052375, 0.0555 , 0.051375, 0.050625, 0.052375, 0.050125, 0.048 , 0.052125, 0.052125, 0.0495 , 0.048875, 0.048 , 0.049875, 0.051125, 0.050625, 0.048 , 0.0465 , 0.048 , 0.04675 , 0.050875, 0.048 , 0.046625, 0.0495 , 0.050375, 0.046625, 0.0515 , 0.049875, 0.049625, 0.04675 , 0.049125, 0.05025 , 0.050375, 0.04725 , 0.047625, 0.047 , 0.051625, 0.0485 , 0.05225 , 0.046875, 0.0475 , 0.04825 , 0.050375, 0.05725 , 0.052375, 0.048 , 0.046375, 0.0475 , 0.0495 , 0.047875, 0.046375, 0.049875, 0.046875, 0.048 , 0.046875, 0.048625, 0.047125, 0.046625, 0.05 , 0.048875, 0.04675 , 0.050125, 0.05425 , 0.051375, 0.050125, 0.053375, 0.052 , 0.053875, 0.048 , 0.05575 , 0.049875, 0.052125, 0.048875, 0.047375, 0.048875, 0.049125, 0.047375, 0.047375, 0.047625, 0.0495 , 0.04825 , 0.047875, 0.04875 , 0.054 , 0.052125, 0.051 , 0.046625, 0.04925 , 0.05075 , 0.054375, 0.0555 , 0.051625, 0.046625, 0.052125, 0.055875, 0.047 , 0.053875, 0.050875, 0.0505 , 0.0465 , 0.053125, 0.050875, 0.050625, 0.051125, 0.050875, 0.056875, 0.04925 , 0.050625, 0.054125, 0.056625, 0.05025 , 0.0465 , 0.04675 , 0.049625, 0.047 , 0.048375, 0.047125, 0.04875 , 0.048375, 0.048875, 0.04775 , 0.04775 , 0.047 , 0.052125, 0.050875, 0.054 , 0.058375, 0.054 , 0.049125, 0.04675 , 0.051875, 0.05425 , 0.050125, 0.04675 , 0.047625, 0.046375, 0.05275 , 0.053 , 0.04875 , 0.049125, 0.047125, 0.049375, 0.0475 , 0.051125, 0.0495 , 0.052375, 0.047 , 0.047125, 0.050875])


  [1]: https://i.imgur.com/OeKzvrb.png
  [2]: https://i.imgur.com/ALtba5F.png

回答1:


Question 1:

You need to specify the sampling rate when using specshow:

librosa.display.specshow(stft, x_axis='time', y_axis='log', sr=sr)

Otherwise the default value (22,050 Hz) will be used (see docs).

Question 2:

librosa.core.frames_to_time does not take stft[0] as argument, which would be the frequency bins of the first frame. Instead, it takes number of frames as first argument.

Imagine you have an audio signal with sr=10000 Hz. Then you run an STFT over it using n_fft=2000 and hop_length=1000. Then you get one frame per hop and the hop is 0.1s long, because 10000 samples correspond to 1s and 1000 samples (1 hop) therefore correspond to 0.1s.

stft[0] is not a frame number. Instead the first stft is of shape (1 + n_fft/2, t) (see here). This means the first dimension is the frequency bin and the second dimension is the frame number (t).

The total number of frames in stft is therefore stft.shape[1]. To get the length of the source audio, you could do:

time = librosa.core.frames_to_time(stft.shape[1], sr=sr, hop_length=hop_length, n_fft=n_fft)


来源:https://stackoverflow.com/questions/57058875/stft-understanding-using-librosa

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!