Stream audio from mic to IBM Watson SpeechToText Web service using Java SDK

前端 未结 2 1373
我寻月下人不归
我寻月下人不归 2020-12-16 06:04

Trying to send a continuous audio stream from microphone directly to IBM Watson SpeechToText Web service using the Java SDK. One of the examples provided with the distributi

2条回答
  •  暖寄归人
    2020-12-16 06:50

    what you need to do is feed the audio to the STT service not as a file, but as a headerless stream of audio samples. You just feed the samples that you capture from the microphone over a WebSocket. You need to set the content type to "audio/pcm; rate=16000" where 16000 is the sampling rate in Hz. If your sampling rate is different, which depends on how the microphone is encoding the audio, you will replace the 16000 by your value, for example: 44100, 48000, etc.

    When feeding pcm audio the STT service wont stop recognizing until you signal the end of audio by sending an empty binary message over the websocket.

    Dani


    Looking at the new version of your code I see some issues:

    1) signaling end of audio can be done by sending an empty binary message through the websocket, that is not what you are doing. The lines

     // signal end of audio; based on WebSocketUploader.stop() source
     byte[] stopData = new byte[0];
     output.write(stopData);
    

    are not doing anything since they wont result in an empty websocket message being sent. Can you please call the method "WebSocketUploader.stop()" instead?

    1. You are capturing audio at 8 bits per sample, you should do 16 bits for enough queality. Also you are only feeding a couple of seconds of audio, not ideal for testing. Can you please write whatever audio you push to STT to a file and then open it with Audacity (using the import feature)? This way you can make sure what you are feeding to STT is good audio.

提交回复
热议问题