Getting mic input and speaker output using Core Audio

问题

So I looked into core data recently a bit and am still a newbie. I have trouble understanding what data I am taping into and how it is effecting the overall data flow. So for some background, I have an app that does video/audio streaming between phones using webRTC. However, I want to check out the data that is being inputed into the device through my mic and the data outputted through the speaker. I looked into AurioTouch demo and Core Audio and currently I have this:

- (void)setupIOUnit
{
    // Create a new instance of AURemoteIO

    AudioComponentDescription desc;
    desc.componentType = kAudioUnitType_Output;
    desc.componentSubType = kAudioUnitSubType_RemoteIO;
    desc.componentManufacturer = kAudioUnitManufacturer_Apple;
    desc.componentFlags = 0;
    desc.componentFlagsMask = 0;

    AudioComponent comp = AudioComponentFindNext(NULL, &desc);
    AudioComponentInstanceNew(comp, &rioUnit);

    //  Enable input and output on AURemoteIO
    //  Input is enabled on the input scope of the input element
    //  Output is enabled on the output scope of the output element

    UInt32 one = 1;
    AudioUnitSetProperty(rioUnit, kAudioOutputUnitProperty_EnableIO, kAudioUnitScope_Input, 1, &one, sizeof(one));
    AudioUnitSetProperty(rioUnit, kAudioOutputUnitProperty_EnableIO, kAudioUnitScope_Output, 0, &one, sizeof(one));


    // Set the MaximumFramesPerSlice property. This property is used to describe to an audio unit the maximum number
    // of samples it will be asked to produce on any single given call to AudioUnitRender
    UInt32 maxFramesPerSlice = 4096;
    AudioUnitSetProperty(rioUnit, kAudioUnitProperty_MaximumFramesPerSlice, kAudioUnitScope_Global, 0, &maxFramesPerSlice, sizeof(UInt32));

    // Get the property value back from AURemoteIO. We are going to use this value to allocate buffers accordingly
    UInt32 propSize = sizeof(UInt32);
    AudioUnitGetProperty(rioUnit, kAudioUnitProperty_MaximumFramesPerSlice, kAudioUnitScope_Global, 0, &maxFramesPerSlice, &propSize);

    // Set the render callback on AURemoteIO
    AURenderCallbackStruct renderCallback;
    renderCallback.inputProc = performRender;
    renderCallback.inputProcRefCon = NULL;
    AudioUnitSetProperty(rioUnit, kAudioUnitProperty_SetRenderCallback, kAudioUnitScope_Input, 0, &renderCallback, sizeof(renderCallback));

    NSLog(@"render set now");
    // Initialize the AURemoteIO instance
    AudioUnitInitialize(rioUnit);
    [self startIOUnit];
    return;
}

- (OSStatus)startIOUnit
{
    OSStatus err = AudioOutputUnitStart(rioUnit);
    if (err) NSLog(@"couldn't start AURemoteIO: %d", (int)err);
    return err;
}

Render callback function

static OSStatus performRender (void                         *inRefCon,
                           AudioUnitRenderActionFlags   *ioActionFlags,
                           const AudioTimeStamp         *inTimeStamp,
                           UInt32                       inBusNumber,
                           UInt32                       inNumberFrames,
                           AudioBufferList              *ioData)
{
    OSStatus err = noErr;
//    the data gets rendered here

    err = AudioUnitRender(rioUnit, ioActionFlags, inTimeStamp, 1, inNumberFrames, ioData);

    if (ioData->mBuffers[0].mDataByteSize >= 12) {
        NSData *myAudioData = [NSData dataWithBytes: ioData->mBuffers[0].mData length:12];
        NSLog(@" playback's first 12 bytes: %@", myAudioData);
    }

    for (UInt32 i=0; i<ioData->mNumberBuffers; ++i) {
        memset(ioData->mBuffers[i].mData, 0, ioData->mBuffers[i].mDataByteSize);
    }

    return err;
}

This prints out some data, which I do not know at this point whether it is the microphone input, or the speaker output. What disturbs me is that even after clearing ioData's buffer, I am still getting audio on the other phone and can play the audio sent by the other phone. This kinda suggests to me that I am touching neither the mic input nor the speaker output.

I have seen some varying parameters for this line:

AudioUnitSetProperty(rioUnit, kAudioUnitProperty_SetRenderCallback, kAudioUnitScope_Input, 0, &renderCallback, sizeof(renderCallback));

and I am wondering if I just have these wrong or something. In addition, is this line:

err = AudioUnitRender(rioUnit, ioActionFlags, inTimeStamp, 1, inNumberFrames, ioData);

influenced by AudioUnitSetProperty? What does setting the 1 do in this scenario?

Any help would he wonderful. Ideally, I want to be able to sample the speaker output data (maybe into a file) as well as the microphone input.

回答1:

The Remote IO audio unit is a part in core audio that does both input and output. It is a complete unit that can record/play audio either the hardware (mic/speaker) and/or from your program. That sometimes makes it confusing. Think of it this way.

** You have inputs and outputs on a remote IO unit.
** You also software and hardware on a remote IO unit. The hardware input is the Mic. The hardware output is the Speaker. The software input is a waveform you create programmatically. The software output is a waveform that has been created.

-------Inputs------------

Bus 0: read from your application (You construct audio waveforms programmatically). Here you write a callback that is automatically called periodically. It says "give me the next audio samples." For example, your code in there could give it audio samples of a triangle wave that you generate programmatically.
So you generate a waveform to feed into the program. You can also feed this input from the output of some other audio unit.

Bus 1: read from the microphone. Here you can read audio samples from the microphone. Note that these are just the raw samples. You can choose to either save them to a file (eg recording app), send them over a network, or even connect them to the speaker (see below). You wont hear the audio from the mic.... it wont save.... unless YOU do something with it.

----------Outputs----------

Bus 0: phone speaker. Here you can write audio data and it will play them on the speaker. So you get another callback that says "give me samples to play" and you fill up the buffer with audio and it plays it. The callback happens periodically sometime before the current buffer is finished playing.

Bus 1: write to your app. Here you can take the audio generated by the remote IO and do something with it in your app. For example, you can connect the output to another or write the data to a file.

So to answer your question "What does setting the 1 do in this scenario?"

Here is the spec from apple on AudioUnitRender

OSStatus AudioUnitRender ( AudioUnit inUnit, 
AudioUnitRenderActionFlags *ioActionFlags, 
const AudioTimeStamp *inTimeStamp,
UInt32 inOutputBusNumber,
UInt32 inNumberFrames,
AudioBufferList *ioData );

The 1 says that you are reading bus 1 on the output which is audio frames produced. So you can do whatever you want here and it will NOT affect what is played at the speaker.

If you want to examine the mic data, use bus 1 on the input. If you want to examine speaker data, use bus 0 on the output.

Note that you cannot do things that take a long time in the callbacks. It is not advised to do anything that could take a lot of time (such as write to network, write to file, print). In that case, you could use GCD or something similar.

回答2:

The RemoteIO Audio Unit does not change the audio input or output being used by any other audio API. It only captures microphone data into buffers, or plays audio data from buffers that are separate from the buffers being used by the other audio APIs.

来源：https://stackoverflow.com/questions/31973110/getting-mic-input-and-speaker-output-using-core-audio

标签

ios

objective-c

core-audio