问题
I have an audio sample of about 14 seconds in 8khz Sample Rate. Im using librosa to extract some features from this audio file.
y, sr = librosa.load(file_name)
stft = np.abs(librosa.stft(y, n_fft=n_fft))
# file_length = 14.650022675736961 #sec
# defaults
# n_fft =2048
# hop_length = 512 # win_length/4 = n_fft/4 = 512 (win_length = n_fft default)
#windowsTime = n_fft * Ts # (1/sr)
stft.shape
# (1025, 631)
Specshow :
librosa.display.specshow(stft, x_axis='time', y_axis='log')
[![stft sr = 22050][1]][1]
Now, i can understand the shape of the STFT
631 time bins = are 4 * ( file_length / Ts * windowsTime) #overlapping
1025 frequency bins = Frames frequency gap sr/n_fft.
so there are 1025 frequencies in 0 to sr/2(Nyquest)
what i cant understand is the different plot of two different sample rates with same ratios. 1 - 22050 as librosa default 2 - 8khz as sampling rate file
y2, sr = librosa.load(file_name, sr=None)
n_fft2 =743 # (same ratio to get same visuals for comparsion)
hop_length = 186 # (1/4 n_fft by default)
stft2 = np.abs(librosa.stft(y2, n_fft=n_fft2))
so ofc the shappe of stft will be different
stft2.shape
# (372, 634)
[![stft sr = 743][2]][2]
1. but why is the absoulte frequencies are not the same? its the same signal just not being oversampled so each sample is unique. what am i missing? is it the static y axis?
2. i couldnt understand the time bins values. im expecting bins in the number of frames when the first 1 is in the hop length and the second bin is windowTime from that point until the end of the file. but the units are wierd?
i want to be able to extract the magnitude of a specific Fbin in a specific time (frame) or additionally be able to sum some of those to get the magnitue for time RANGE.
Therefore, if i take stft[number of fBin] which is 1 row of 1025 fBins (stft[1025]) and look at it contents so stft[0] contains 630 points, which are exactly 630 time points for each frequency so each of the frames 1-1025 will have the same time points.
so if i take one sample which suits all the other fbins as well ( same time points) which is stft[0] i would be able to choose time frame and fBin and get the spcific magnitude:
times = librosa.core.frames_to_time(stft2[0], sr=sr2, n_fft=n_fft2, hop_length=hop_length)
fft_bin = 6
time_idx = 10
print('freq (Hz)', freqs[fft_bin])
print('time (s)', times[time_idx])
print('amplitude', stft[fft_bin, time_idx])
array([0.047375, 0.047625, 0.04825 , 0.04825 , 0.046875, 0.04675 , 0.05 , 0.051625, 0.051 , 0.048 , 0.05225 , 0.050375, 0.04925 , 0.04725 , 0.051625, 0.0465 , 0.05225 , 0.05 , 0.053 , 0.053875, 0.048 , 0.0485 , 0.047875, 0.04775 , 0.0485 , 0.049 , 0.051375, 0.047125, 0.051125, 0.047125, 0.04725 , 0.05025 , 0.05425 , 0.05475 , 0.051375, 0.060375, 0.050625, 0.04875 , 0.054125, 0.048 , 0.05025 , 0.052375, 0.04975 , 0.054125, 0.055625, 0.047125, 0.0475 , 0.047 , 0.049875, 0.05025 , 0.048375, 0.047 , 0.050625, 0.05 , 0.046625, 0.04925 , 0.048 , 0.049125, 0.05375 , 0.0545 , 0.04925 , 0.049125, 0.049125, 0.049625, 0.047 , 0.047625, 0.0535 , 0.051875, 0.05075 , 0.04975 , 0.047375, 0.049 , 0.0485 , 0.050125, 0.048 , 0.05475 , 0.05175 , 0.050125, 0.04725 , 0.0575 , 0.056875, 0.047 , 0.0485 , 0.055375, 0.04975 , 0.047 , 0.0495 , 0.051375, 0.04675 , 0.04925 , 0.052125, 0.04825 , 0.048125, 0.046875, 0.047 , 0.048625, 0.050875, 0.05125 , 0.04825 , 0.052125, 0.052375, 0.05125 , 0.049875, 0.048625, 0.04825 , 0.0475 , 0.048375, 0.050875, 0.052875, 0.0475 , 0.0485 , 0.05225 , 0.053625, 0.05075 , 0.0525 , 0.047125, 0.0485 , 0.048875, 0.049 , 0.0515 , 0.055875, 0.0515 , 0.05025 , 0.05125 , 0.054625, 0.05525 , 0.047 , 0.0545 , 0.052375, 0.049875, 0.051 , 0.048625, 0.0475 , 0.048 , 0.048875, 0.050625, 0.05375 , 0.051875, 0.048125, 0.052125, 0.048125, 0.051 , 0.052625, 0.048375, 0.047625, 0.05 , 0.048125, 0.050375, 0.049125, 0.053125, 0.053875, 0.05075 , 0.052375, 0.048875, 0.05325 , 0.05825 , 0.055625, 0.0465 , 0.05475 , 0.051125, 0.048375, 0.0505 , 0.04675 , 0.0495 , 0.04725 , 0.046625, 0.049625, 0.054 , 0.056125, 0.05175 , 0.050625, 0.050375, 0.047875, 0.047 , 0.048125, 0.048875, 0.050625, 0.049875, 0.047 , 0.0505 , 0.047 , 0.053125, 0.047625, 0.05025 , 0.04825 , 0.05275 , 0.051625, 0.05 , 0.051625, 0.05425 , 0.052 , 0.04775 , 0.047 , 0.049125, 0.05375 , 0.0535 , 0.04925 , 0.05125 , 0.046375, 0.04775 , 0.04775 , 0.0465 , 0.047 , 0.04675 , 0.04675 , 0.04925 , 0.05125 , 0.046375, 0.04825 , 0.0525 , 0.057875, 0.056375, 0.054375, 0.04825 , 0.0535 , 0.05475 , 0.0485 , 0.048875, 0.048625, 0.0485 , 0.047625, 0.046875, 0.0465 , 0.05125 , 0.054 , 0.05 , 0.048 , 0.047875, 0.0515 , 0.048125, 0.055875, 0.054875, 0.051625, 0.048125, 0.047625, 0.048375, 0.052875, 0.0485 , 0.0475 , 0.0495 , 0.05025 , 0.05675 , 0.0585 , 0.051625, 0.05625 , 0.0605 , 0.052125, 0.0495 , 0.049 , 0.047875, 0.051375, 0.054125, 0.0525 , 0.0515 , 0.057875, 0.055 , 0.05375 , 0.046375, 0.04775 , 0.0485 , 0.050125, 0.050875, 0.04925 , 0.049125, 0.0465 , 0.04975 , 0.053375, 0.05225 , 0.0475 , 0.046375, 0.05375 , 0.049875, 0.049875, 0.047375, 0.049125, 0.049375, 0.04875 , 0.048125, 0.05075 , 0.0505 , 0.046375, 0.047375, 0.048625, 0.0485 , 0.047125, 0.052625, 0.051125, 0.04725 , 0.050875, 0.053875, 0.0475 , 0.0495 , 0.051 , 0.055 , 0.053 , 0.050125, 0.04675 , 0.05375 , 0.054375, 0.04725 , 0.046875, 0.04925 , 0.04725 , 0.0495 , 0.05075 , 0.050875, 0.04775 , 0.05125 , 0.050125, 0.047875, 0.04825 , 0.046625, 0.0475 , 0.046375, 0.04775 , 0.05075 , 0.048125, 0.046375, 0.049625, 0.0495 , 0.04675 , 0.046625, 0.0475 , 0.04825 , 0.053 , 0.050875, 0.049 , 0.057875, 0.058875, 0.049875, 0.049125, 0.0475 , 0.05225 , 0.055 , 0.055375, 0.053875, 0.051125, 0.049875, 0.05025 , 0.050875, 0.049 , 0.0575 , 0.051875, 0.049375, 0.04775 , 0.051125, 0.050375, 0.0465 , 0.047375, 0.0465 , 0.046375, 0.048875, 0.051875, 0.047 , 0.047125, 0.047125, 0.046875, 0.049625, 0.048625, 0.051 , 0.049 , 0.046375, 0.049 , 0.056125, 0.054625, 0.047625, 0.046625, 0.0475 , 0.051875, 0.05175 , 0.047625, 0.050375, 0.055125, 0.05275 , 0.047125, 0.05325 , 0.060125, 0.056625, 0.053 , 0.052125, 0.047125, 0.04825 , 0.050375, 0.05025 , 0.048 , 0.046625, 0.047125, 0.04875 , 0.047 , 0.05525 , 0.0535 , 0.047 , 0.0495 , 0.0535 , 0.05125 , 0.046625, 0.0495 , 0.04675 , 0.04875 , 0.047125, 0.04975 , 0.047 , 0.049875, 0.046875, 0.047125, 0.048 , 0.046375, 0.0495 , 0.04975 , 0.05125 , 0.048375, 0.049125, 0.0515 , 0.048375, 0.052375, 0.051125, 0.046375, 0.047125, 0.050375, 0.0465 , 0.052375, 0.05375 , 0.04925 , 0.05025 , 0.0565 , 0.054875, 0.048 , 0.049375, 0.052625, 0.055375, 0.053375, 0.05075 , 0.048875, 0.05475 , 0.05075 , 0.0485 , 0.049125, 0.0475 , 0.047375, 0.047375, 0.047 , 0.052125, 0.053875, 0.049 , 0.052625, 0.0485 , 0.04675 , 0.04875 , 0.05 , 0.0545 , 0.05025 , 0.0495 , 0.0515 , 0.0485 , 0.05025 , 0.0465 , 0.0465 , 0.048375, 0.06375 , 0.10175 , 0.11975 , 0.118375, 0.121375, 0.12675 , 0.123 , 0.095375, 0.055 , 0.05525 , 0.04775 , 0.053125, 0.052375, 0.056625, 0.0565 , 0.046875, 0.048 , 0.05175 , 0.048 , 0.052 , 0.048 , 0.048 , 0.05175 , 0.05025 , 0.049625, 0.049625, 0.047375, 0.046625, 0.052375, 0.0555 , 0.051375, 0.050625, 0.052375, 0.050125, 0.048 , 0.052125, 0.052125, 0.0495 , 0.048875, 0.048 , 0.049875, 0.051125, 0.050625, 0.048 , 0.0465 , 0.048 , 0.04675 , 0.050875, 0.048 , 0.046625, 0.0495 , 0.050375, 0.046625, 0.0515 , 0.049875, 0.049625, 0.04675 , 0.049125, 0.05025 , 0.050375, 0.04725 , 0.047625, 0.047 , 0.051625, 0.0485 , 0.05225 , 0.046875, 0.0475 , 0.04825 , 0.050375, 0.05725 , 0.052375, 0.048 , 0.046375, 0.0475 , 0.0495 , 0.047875, 0.046375, 0.049875, 0.046875, 0.048 , 0.046875, 0.048625, 0.047125, 0.046625, 0.05 , 0.048875, 0.04675 , 0.050125, 0.05425 , 0.051375, 0.050125, 0.053375, 0.052 , 0.053875, 0.048 , 0.05575 , 0.049875, 0.052125, 0.048875, 0.047375, 0.048875, 0.049125, 0.047375, 0.047375, 0.047625, 0.0495 , 0.04825 , 0.047875, 0.04875 , 0.054 , 0.052125, 0.051 , 0.046625, 0.04925 , 0.05075 , 0.054375, 0.0555 , 0.051625, 0.046625, 0.052125, 0.055875, 0.047 , 0.053875, 0.050875, 0.0505 , 0.0465 , 0.053125, 0.050875, 0.050625, 0.051125, 0.050875, 0.056875, 0.04925 , 0.050625, 0.054125, 0.056625, 0.05025 , 0.0465 , 0.04675 , 0.049625, 0.047 , 0.048375, 0.047125, 0.04875 , 0.048375, 0.048875, 0.04775 , 0.04775 , 0.047 , 0.052125, 0.050875, 0.054 , 0.058375, 0.054 , 0.049125, 0.04675 , 0.051875, 0.05425 , 0.050125, 0.04675 , 0.047625, 0.046375, 0.05275 , 0.053 , 0.04875 , 0.049125, 0.047125, 0.049375, 0.0475 , 0.051125, 0.0495 , 0.052375, 0.047 , 0.047125, 0.050875])
[1]: https://i.imgur.com/OeKzvrb.png
[2]: https://i.imgur.com/ALtba5F.png
回答1:
Question 1:
You need to specify the sampling rate when using specshow:
librosa.display.specshow(stft, x_axis='time', y_axis='log', sr=sr)
Otherwise the default value (22,050 Hz) will be used (see docs).
Question 2:
librosa.core.frames_to_time does not take stft[0] as argument, which would be the frequency bins of the first frame. Instead, it takes number of frames as first argument.
Imagine you have an audio signal with sr=10000 Hz. Then you run an STFT over it using n_fft=2000 and hop_length=1000. Then you get one frame per hop and the hop is 0.1s long, because 10000 samples correspond to 1s and 1000 samples (1 hop) therefore correspond to 0.1s.
stft[0] is not a frame number. Instead the first stft is of shape (1 + n_fft/2, t) (see here). This means the first dimension is the frequency bin and the second dimension is the frame number (t).
The total number of frames in stft is therefore stft.shape[1].
To get the length of the source audio, you could do:
time = librosa.core.frames_to_time(stft.shape[1], sr=sr, hop_length=hop_length, n_fft=n_fft)
来源:https://stackoverflow.com/questions/57058875/stft-understanding-using-librosa