How to Recognise when user START & STOP speaking in android? (Voice Recognition in Android)

徘徊边缘 提交于 2019-12-04 08:01:54

问题


I have done a lot of R&D and gone through a lot of resources to resolve my problem but I have FAILED to get any proper solution.

I have developed an app, now i want to add Voice based functionality to it.

The required features are

1) when USER starts speaking, it should record the audio/video and

2) when user stops speaking, it should play the recorded audio/video .

Note:Here video means whatever user performs within app during that period of time. For example, clicks on the buttons or some kind of animation, etc.

I don't want to use Google's Voice Recognizer available by default in the Android as it requires Internet but my app runs offline.Also, I came to know of CMU-Sphinx. But it is not helpful as per my requirements.

EDITED :- Also,I would like to add that i have achieved this using Start & Stop button but I don't want to use these buttons.

If anyone has any idea or any suggestions, please let me know.


回答1:


The simplest and most common method is to count the number of zero crossings in the audio (ie when the sign changes from positive to negative).

If that value is too high then the sound is unlikely to be speech. If it is too low then, again, it is unlikely to be speech.

Combine that with a simple energy level (how loud the audio is) and you have a solution which is pretty robust.

If you need a more accurate system then it gets much much more complex. One way is to extract audio features (MFCCs for example) from "training data", model them up with something like a GMM and then test the features you extract from live audio against the GMM. This way you can model the likelihood that a given frame of audio is speech over non-speech. This is not a simple process however.

I'd strongly recommend going down the lines of zero-crossings as it is simple to implement and works fine 99% of the time :)




回答2:


You can try adding listeners to the application events like navigation , clicking the animation etc... in listeners implementation you can trigger the start/stop functionalities...

http://tseng-blog.nge-web.net/blog/2009/02/14/implementing-listeners-in-your-android-java-application/

look at these examples... this might be helpful to you....


but i m wondering that what you described about your application behavior looks like you gonna reinvent like talking tom huh ??? :-P




回答3:


below is the code I use for an iPhone application that does exactly the same thing. The code is in Objective-C++ but I have lots of comments in it. This code is executed inside the callback function of a recording queue. I am sure that a similar approach exists for the Android platform.

This approach works very nice in almost every acoustic environment I have used it and it is used in our app. You can download it to test it if you want.

Try implementing it in the android platform and you are done!

// If there are some audio samples in the audio buffer of the recording queue
if (inNumPackets > 0) {
        // The following 4 lines of code are vector functions that compute 
        // the average power of the current audio samples. 
        // Go [here][2] to view documentation about them. 
        vDSP_vflt16((SInt16*)inBuffer->mAudioData, 1, aqr->currentFrameSamplesArray, 1, inNumPackets);
        vDSP_vabs(aqr->currentFrameSamplesArray, 1, aqr->currentFrameSamplesArray, 1, inNumPackets);
        vDSP_vsmul(aqr->currentFrameSamplesArray, 1, &aqr->divider, aqr->currentFrameSamplesArray, 1, inNumPackets);
        vDSP_sve(aqr->currentFrameSamplesArray, 1, &aqr->instantPower, inNumPackets);
        // InstantPower holds the energy for the current audio samples
        aqr->instantPower /= (CGFloat)inNumPackets;
        // S.O.S. Avoid +-infs, NaNs add a small number to InstantPower
        aqr->instantPower = log10f(aqr->instantPower + 0.001f);
        // InstantAvgPower holds the energy for a bigger window 
        // of time than InstantPower
        aqr->instantAvgPower = aqr->instantAvgPower * 0.95f + 0.05f * aqr->instantPower;
        // AvgPower holds the energy for an even bigger window 
        // of time than InstantAvgPower
        aqr->avgPower = aqr->avgPower * 0.97f + 0.03f * aqr->instantAvgPower;
        // This is the ratio that tells us when to record
        CGFloat ratio = aqr->avgPower / aqr->instantPower;
        // If we are not already writing to an audio file and 
        // the ratio is bigger than a specific hardcoded value 
        // (this value has to do with the quality of the microphone 
        // of the device. I have set it to 1.5 for an iPhone) then start writing!
        if (!aqr->writeToFile && ratio > aqr->recordingThreshold) {
            aqr->writeToFile = YES;
        } 
        if (aqr->writeToFile) {
            // write packets to file
            XThrowIfError(AudioFileWritePackets(aqr->mRecordFile, FALSE, inBuffer->mAudioDataByteSize,
                                                inPacketDesc, aqr->mRecordPacket, &inNumPackets, inBuffer->mAudioData),
                          "AudioFileWritePackets failed");
            aqr->mRecordPacket += inNumPackets;
            // Now if we are recording but the instantAvgPower is lower 
            // than avgPower then we increase the countToStopRecording counter
            if (aqr->instantAvgPower < aqr->avgPower) {
                aqr->countToStopRecording++;
            } 
            // or else set him to 0.
            else {
                aqr->countToStopRecording = 0;
            }
            // If we have detected that there is not enough power in 30 consecutive
            // audio sample buffers OR we have recorded TOO much audio 
            // (the user speaks for more than a threshold of time) stop recording 
            if (aqr->countToStopRecording > 30 || aqr->mRecordPacket > kMaxAudioPacketsDuration) {
                aqr->countToStopRecording = 0;
                aqr->writeToFile = NO;
                // Notify the audio player that we finished recording 
                // and start playing the audio!!!
                dispatch_async(dispatch_get_main_queue(), ^{[[NSNotificationCenter defaultCenter] postNotificationName:@"RecordingEndedPlayNow" object:nil];});
            }
        }
    }

Best!




回答4:


Here is the simple code which detect user stop speaking. I am checking below value

recorder.getMaxAmplitude();

sample code:

public void startRecording() throws IOException {

    Thread thread = new Thread() {
        @Override
        public void run() {
            int i = 0;
            while (i == 0) {

                try {
                    sleep(100);

                    if (recorder != null) {

                        checkValue(recorder.getMaxAmplitude());

                    }
                } catch (Exception e) {
                    e.printStackTrace();
                }
            }
        }
    };
    thread.start();


}

checkValue function:

public void checkValue(int amplitude) {


    try{

        if (amplitude > 1000) {
            Log.d("I", "Amplitude : " + amplitude);
            amplitude = recorder.getMaxAmplitude();
            Thread.sleep(2000);
            isListened=true;
        }else if(isListened) {
            Log.d("I","Stop me");
            recordingDialog.dismiss();
        }

    }catch (Exception e){
        e.printStackTrace();
    }


}

I know this question is very old and previously answered but this small code snippet might help someone else.



来源:https://stackoverflow.com/questions/9788674/how-to-recognise-when-user-start-stop-speaking-in-android-voice-recognition

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!