Trying to send a continuous audio stream from microphone directly to IBM Watson SpeechToText Web service using the Java SDK. One of the examples provided with the distributi
The Java SDK has an example and supports this.
Update your pom.xml
with:
<dependency>
<groupId>com.ibm.watson.developer_cloud</groupId>
<artifactId>java-sdk</artifactId>
<version>3.3.1</version>
</dependency>
Here is an example of how to listen to your microphone.
SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("<username>", "<password>");
// Signed PCM AudioFormat with 16kHz, 16 bit sample size, mono
int sampleRate = 16000;
AudioFormat format = new AudioFormat(sampleRate, 16, 1, true, false);
DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);
if (!AudioSystem.isLineSupported(info)) {
System.out.println("Line not supported");
System.exit(0);
}
TargetDataLine line = (TargetDataLine) AudioSystem.getLine(info);
line.open(format);
line.start();
AudioInputStream audio = new AudioInputStream(line);
RecognizeOptions options = new RecognizeOptions.Builder()
.continuous(true)
.interimResults(true)
.timestamps(true)
.wordConfidence(true)
//.inactivityTimeout(5) // use this to stop listening when the speaker pauses, i.e. for 5s
.contentType(HttpMediaType.AUDIO_RAW + "; rate=" + sampleRate)
.build();
service.recognizeUsingWebSocket(audio, options, new BaseRecognizeCallback() {
@Override
public void onTranscription(SpeechResults speechResults) {
System.out.println(speechResults);
}
});
System.out.println("Listening to your voice for the next 30s...");
Thread.sleep(30 * 1000);
// closing the WebSockets underlying InputStream will close the WebSocket itself.
line.stop();
line.close();
System.out.println("Fin.");
what you need to do is feed the audio to the STT service not as a file, but as a headerless stream of audio samples. You just feed the samples that you capture from the microphone over a WebSocket. You need to set the content type to "audio/pcm; rate=16000" where 16000 is the sampling rate in Hz. If your sampling rate is different, which depends on how the microphone is encoding the audio, you will replace the 16000 by your value, for example: 44100, 48000, etc.
When feeding pcm audio the STT service wont stop recognizing until you signal the end of audio by sending an empty binary message over the websocket.
Dani
Looking at the new version of your code I see some issues:
1) signaling end of audio can be done by sending an empty binary message through the websocket, that is not what you are doing. The lines
// signal end of audio; based on WebSocketUploader.stop() source
byte[] stopData = new byte[0];
output.write(stopData);
are not doing anything since they wont result in an empty websocket message being sent. Can you please call the method "WebSocketUploader.stop()" instead?