Continuous speech recogn. with SFSpeechRecognizer (ios10-beta)

后端 未结 5 1045
情歌与酒
情歌与酒 2020-12-08 01:28

I am trying to perform cont. speech recognition using AVCapture on iOS 10 beta. I have setup captureOutput(...) to continuously get CMSampleB

5条回答
  •  半阙折子戏
    2020-12-08 01:53

    It turns out, that Apple's new native Speech Recognition does not detect end of Speech silences automatically(a bug?), which for your case is useful, because Speech Recognition is active for nearly one minute (the maximum period, permitted by Apple's service). So basically if you need to have continuous ASR you must re-launch speech recognition when your delegate triggers:

    func speechRecognitionTask(task: SFSpeechRecognitionTask, didFinishSuccessfully successfully: Bool) //wether succesfully= true or not
    

    Here is the recording/Speech recognition SWIFT code I use, it works perfectly. Ignore the part of where I calculate the mean power of the microphone volume, if you don't need it. I use it to animate a waveform. Don't forget to set the SFSpeechRecognitionTaskDelegate, and is delegate methods, if you need extra code, let me know.

    func startNativeRecording() throws {
            LEVEL_LOWPASS_TRIG=0.01
            //Setup Audio Session
            node = audioEngine.inputNode!
            let recordingFormat = node!.outputFormatForBus(0)
            node!.installTapOnBus(0, bufferSize: 1024, format: recordingFormat){(buffer, _) in
                self.nativeASRRequest.appendAudioPCMBuffer(buffer)
    
     //Code to animate a waveform with the microphone volume, ignore if you don't need it:
                var inNumberFrames:UInt32 = buffer.frameLength;
                var samples:Float32 = buffer.floatChannelData[0][0]; //https://github.com/apple/swift-evolution/blob/master/proposals/0107-unsaferawpointer.md
                var avgValue:Float32 = 0;
                vDSP_maxmgv(buffer.floatChannelData[0], 1, &avgValue, vDSP_Length(inNumberFrames)); //Accelerate Framework
                //vDSP_maxmgv returns peak values
                //vDSP_meamgv returns mean magnitude of a vector
    
                let avg3:Float32=((avgValue == 0) ? (0-100) : 20.0)
                var averagePower=(self.LEVEL_LOWPASS_TRIG*avg3*log10f(avgValue)) + ((1-self.LEVEL_LOWPASS_TRIG)*self.averagePowerForChannel0) ;
                print("AVG. POWER: "+averagePower.description)
                dispatch_async(dispatch_get_main_queue(), { () -> Void in
                    //print("VU: "+vu.description)
                    var fAvgPwr=CGFloat(averagePower)
                    print("AvgPwr: "+fAvgPwr.description)
    
                    var waveformFriendlyValue=0.5+fAvgPwr //-0.5 is AvgPwrValue when user is silent
                    if(waveformFriendlyValue<0){waveformFriendlyValue=0} //round values <0 to 0
                    self.waveview.hidden=false
                    self.waveview.updateWithLevel(waveformFriendlyValue)
                })
            }
            audioEngine.prepare()
            try audioEngine.start()
            isNativeASRBusy=true
            nativeASRTask = nativeSpeechRecognizer?.recognitionTaskWithRequest(nativeASRRequest, delegate: self)
            nativeSpeechRecognizer?.delegate=self
      //I use this timer to track no speech timeouts, ignore if not neeeded:
            self.endOfSpeechTimeoutTimer = NSTimer.scheduledTimerWithTimeInterval(utteranceTimeoutSeconds, target: self, selector:  #selector(ViewController.stopNativeRecording), userInfo: nil, repeats: false)
        }
    

提交回复
热议问题