Spectrogram from AVAudioPCMBuffer using Accelerate framework in Swift

霸气de小男生 提交于 2019-12-18 11:33:00

问题


I'm trying to generate a spectrogram from an AVAudioPCMBuffer in Swift. I install a tap on an AVAudioMixerNode and receive a callback with the audio buffer. I'd like to convert the signal in the buffer to a [Float:Float] dictionary where the key represents the frequency and the value represents the magnitude of the audio on the corresponding frequency.

I tried using Apple's Accelerate framework but the results I get seem dubious. I'm sure it's just in the way I'm converting the signal.

I looked at this blog post amongst other things for a reference.

Here is what I have:

self.audioEngine.mainMixerNode.installTapOnBus(0, bufferSize: 1024, format: nil, block: { buffer, when in
    let bufferSize: Int = Int(buffer.frameLength)

    // Set up the transform
    let log2n = UInt(round(log2(Double(bufferSize))))
    let fftSetup = vDSP_create_fftsetup(log2n, Int32(kFFTRadix2))

    // Create the complex split value to hold the output of the transform
    var realp = [Float](count: bufferSize/2, repeatedValue: 0)
    var imagp = [Float](count: bufferSize/2, repeatedValue: 0)
    var output = DSPSplitComplex(realp: &realp, imagp: &imagp)

    // Now I need to convert the signal from the buffer to complex value, this is what I'm struggling to grasp.
    // The complexValue should be UnsafePointer<DSPComplex>. How do I generate it from the buffer's floatChannelData?
    vDSP_ctoz(complexValue, 2, &output, 1, UInt(bufferSize / 2))

    // Do the fast Fournier forward transform
    vDSP_fft_zrip(fftSetup, &output, 1, log2n, Int32(FFT_FORWARD))

    // Convert the complex output to magnitude
    var fft = [Float](count:Int(bufferSize / 2), repeatedValue:0.0)
    vDSP_zvmags(&output, 1, &fft, 1, vDSP_length(bufferSize / 2))

    // Release the setup
    vDSP_destroy_fftsetup(fftsetup)

    // TODO: Convert fft to [Float:Float] dictionary of frequency vs magnitude. How?
})

My questions are

  1. How do I convert the buffer.floatChannelData to UnsafePointer<DSPComplex> to pass to the vDSP_ctoz function? Is there a different/better way to do it maybe even bypassing vDSP_ctoz?
  2. Is this different if the buffer contains audio from multiple channels? How is it different when the buffer audio channel data is or isn't interleaved?
  3. How do I convert the indices in the fft array to frequencies in Hz?
  4. Anything else I may be doing wrong?

Update

Thanks everyone for suggestions. I ended up filling the complex array as suggested in the accepted answer. When I plot the values and play a 440 Hz tone on a tuning fork it registers exactly where it should.

Here is the code to fill the array:

var channelSamples: [[DSPComplex]] = []
for var i=0; i<channelCount; ++i {
    channelSamples.append([])
    let firstSample = buffer.format.interleaved ? i : i*bufferSize
    for var j=firstSample; j<bufferSize; j+=buffer.stride*2 {
        channelSamples[i].append(DSPComplex(real: buffer.floatChannelData.memory[j], imag: buffer.floatChannelData.memory[j+buffer.stride]))
    }
}

The channelSamples array then holds separate array of samples for each channel.

To calculate the magnitude I used this:

var spectrum = [Float]()
for var i=0; i<bufferSize/2; ++i {
    let imag = out.imagp[i]
    let real = out.realp[i]
    let magnitude = sqrt(pow(real,2)+pow(imag,2))
    spectrum.append(magnitude)
}

回答1:


  1. Hacky way: you can just cast a float array. Where reals and imag values are going one after another.
  2. It depends on if audio is interleaved or not. If it's interleaved (most of the cases) left and right channels are in the array with STRIDE 2
  3. Lowest frequency in your case is frequency of a period of 1024 samples. In case of 44100kHz it's ~23ms, lowest frequency of the spectrum will be 1/(1024/44100) (~43Hz). Next frequency will be twice of this (~86Hz) and so on.



回答2:


4: You have installed a callback handler on an audio bus. This is likely run with real-time thread priority and frequently. You should not do anything that has potential for blocking (it will likely result in priority inversion and glitchy audio):

  1. Allocate memory (realp, imagp - [Float](.....) is shorthand for Array[float] - and likely allocated on the heap`. Pre-allocate these

  2. Call lengthy operations such as vDSP_create_fftsetup() - which also allocates memory and initialises it. Again, you can allocate this once outside of your function.



来源:https://stackoverflow.com/questions/32891012/spectrogram-from-avaudiopcmbuffer-using-accelerate-framework-in-swift

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!