audio-processing

How to play and read .caf PCM audio file

大憨熊 提交于 2020-01-15 03:06:29
问题 I have an app that selects a song from the iPod Library then copies that song into the app's directory as a '.caf' file. I now need to play and at the same time read that file into Apples FFT from the Accelerate framework so I can visualize the data like a spectrogram. Here is the code for the FFT: void FFTAccelerate::doFFTReal(float samples[], float amp[], int numSamples) { int i; vDSP_Length log2n = log2f(numSamples); //Convert float array of reals samples to COMPLEX_SPLIT array A vDSP_ctoz

How to play and read .caf PCM audio file

大兔子大兔子 提交于 2020-01-15 03:06:10
问题 I have an app that selects a song from the iPod Library then copies that song into the app's directory as a '.caf' file. I now need to play and at the same time read that file into Apples FFT from the Accelerate framework so I can visualize the data like a spectrogram. Here is the code for the FFT: void FFTAccelerate::doFFTReal(float samples[], float amp[], int numSamples) { int i; vDSP_Length log2n = log2f(numSamples); //Convert float array of reals samples to COMPLEX_SPLIT array A vDSP_ctoz

How to use a context window to segment a whole log Mel-spectrogram (ensuring the same number of segments for all the audios)?

筅森魡賤 提交于 2020-01-03 01:55:14
问题 I have several audios with different duration. So I don't know how to ensure the same number N of segments of the audio. I'm trying to implement an existing paper, so it's said that first a Log Mel-Spectrogram is performed in the whole audio with 64 Mel-filter banks from 20 to 8000 Hz, by using a 25 ms Hamming window and a 10 ms overlapping. Then, in order to get that I have the following code lines: y, sr = librosa.load(audio_file, sr=None) #sr = 22050 #len(y) = 237142 #duration = 5

Python NumPy - FFT and Inverse FFT?

为君一笑 提交于 2020-01-01 10:10:10
问题 So I've been working with FFT, and I'm currently trying to get a sound waveform from a file with FFT, (modify it eventually), but then output that modified waveform back to a file. I've gotten the FFT of the soundwave and then used an inverse FFT function on it, but the output file doesn't sound right at all. I haven't done any filtering on the waveform - I'm just testing out getting the frequency data and then putting it back into a file - it should sound the same, but it sounds wildly

Python NumPy - FFT and Inverse FFT?

时光总嘲笑我的痴心妄想 提交于 2020-01-01 10:10:08
问题 So I've been working with FFT, and I'm currently trying to get a sound waveform from a file with FFT, (modify it eventually), but then output that modified waveform back to a file. I've gotten the FFT of the soundwave and then used an inverse FFT function on it, but the output file doesn't sound right at all. I haven't done any filtering on the waveform - I'm just testing out getting the frequency data and then putting it back into a file - it should sound the same, but it sounds wildly

Acoustic Audio Comparing Library

倖福魔咒の 提交于 2020-01-01 03:49:05
问题 I need a software or a library which handles with audio comparison, but not using the tag's inside mp3 ,it should compare similarity or confidence between 2 audio Files, or if i cut a piece from an audio file, the software should point where is that file token from the main audio file (i hope i was clear enough). So how i heard this technology is called Audio Acoustic Comparing , and based on some audio sample file, which we can call fingerprint . The software should point me if it finds an

Media Foundation get encoded bitrate

陌路散爱 提交于 2019-12-25 07:54:56
问题 I am trying to get the encoded bitrate of an audio file (mp4, m4a, aac) using Media Foundation. What I did is: PROPVARIANT prop; IMFSourceReader* reader; MFCreateSourceReaderFromURL(filePath, NULL, &reader); reader->GetPresentationAttribute(MF_SOURCE_READER_MEDIASOURCE, MF_PD_AUDIO_ENCODING_BITRATE, &prop); The second line ends with an error and with empty PROPVARIAT. However, when I do: reader->GetPresentationAttribute(MF_SOURCE_READER_MEDIASOURCE, MF_PD_DURATION, &prop); It works fine. Does

implementing FftPitchDetector in C#

爷,独闯天下 提交于 2019-12-25 02:39:05
问题 I've added FftPitchDetector.cs into my project, but I'm not sure how to use it. My code: private void sourceStream_DataAvailable(object sender, NAudio.Wave.WaveInEventArgs e) { if (waveWriter == null) return; byte[] buffer = e.Buffer; float sample32 = 0; int bytesRecorded = e.BytesRecorded; float[] floats = new float[buffer.Length]; waveWriter.Write(buffer, 0, bytesRecorded); for (int index = 0; index < e.BytesRecorded; index += 2) { short sample = (short)((buffer[index + 1] << 8) | buffer

asterisk silence detection on connected call

不想你离开。 提交于 2019-12-24 15:03:40
问题 Sorry in advance if my question makes no sense to you. I am newbie in asterisk, and what I am trying to do is writing a dial plan which can connects 2 soft phone end point (VoIP client end points) and then try to detect silence in ongoing call. I am able to make through call by using following dial plan exten = 100, 1, Answer() same = 100, n, Monitor() same = 100, n, Dial(SIP/client1,15) when I dialed 100, it makes call to client1, which I received gracefully and now call is on going, now I

Breaking a video into frames with python

久未见 提交于 2019-12-24 00:48:23
问题 I am trying to write a program that deletes frames of a video that don't have a particular symbol in them. My general plan: Split the audio from the video Split the video into frames Run the frames through a subroutine that looks for the symbol, by checking the pixels where it should be for being the correct color, and logging the ones that don't. Delete those frames and corresponding audio seconds Splices it all back together. I need some help finding libraries that can do this. I was