Stream audio from mic to IBM Watson SpeechToText Web service using Java SDK

前端未结

关注

 2  1374

Trying to send a continuous audio stream from microphone directly to IBM Watson SpeechToText Web service using the Java SDK. One of the examples provided with the distributi

相关标签:

2条回答

生来不讨喜

2020-12-16 06:39

The Java SDK has an example and supports this.

Update your pom.xml with:

 <dependency>
   <groupId>com.ibm.watson.developer_cloud</groupId>
   <artifactId>java-sdk</artifactId>
   <version>3.3.1</version>
 </dependency>

Here is an example of how to listen to your microphone.

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("<username>", "<password>");

// Signed PCM AudioFormat with 16kHz, 16 bit sample size, mono
int sampleRate = 16000;
AudioFormat format = new AudioFormat(sampleRate, 16, 1, true, false);
DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);

if (!AudioSystem.isLineSupported(info)) {
  System.out.println("Line not supported");
  System.exit(0);
}

TargetDataLine line = (TargetDataLine) AudioSystem.getLine(info);
line.open(format);
line.start();

AudioInputStream audio = new AudioInputStream(line);

RecognizeOptions options = new RecognizeOptions.Builder()
  .continuous(true)
  .interimResults(true)
  .timestamps(true)
  .wordConfidence(true)
  //.inactivityTimeout(5) // use this to stop listening when the speaker pauses, i.e. for 5s
  .contentType(HttpMediaType.AUDIO_RAW + "; rate=" + sampleRate)
  .build();

service.recognizeUsingWebSocket(audio, options, new BaseRecognizeCallback() {
  @Override
  public void onTranscription(SpeechResults speechResults) {
    System.out.println(speechResults);
  }
});

System.out.println("Listening to your voice for the next 30s...");
Thread.sleep(30 * 1000);

// closing the WebSockets underlying InputStream will close the WebSocket itself.
line.stop();
line.close();

System.out.println("Fin.");

0 讨论(0)

暖寄归人

2020-12-16 06:50
what you need to do is feed the audio to the STT service not as a file, but as a headerless stream of audio samples. You just feed the samples that you capture from the microphone over a WebSocket. You need to set the content type to "audio/pcm; rate=16000" where 16000 is the sampling rate in Hz. If your sampling rate is different, which depends on how the microphone is encoding the audio, you will replace the 16000 by your value, for example: 44100, 48000, etc.

When feeding pcm audio the STT service wont stop recognizing until you signal the end of audio by sending an empty binary message over the websocket.

Dani

Looking at the new version of your code I see some issues:

1) signaling end of audio can be done by sending an empty binary message through the websocket, that is not what you are doing. The lines
```
 // signal end of audio; based on WebSocketUploader.stop() source
 byte[] stopData = new byte[0];
 output.write(stopData);
```
are not doing anything since they wont result in an empty websocket message being sent. Can you please call the method "WebSocketUploader.stop()" instead?
1. You are capturing audio at 8 bits per sample, you should do 16 bits for enough queality. Also you are only feeding a couple of seconds of audio, not ideal for testing. Can you please write whatever audio you push to STT to a file and then open it with Audacity (using the import feature)? This way you can make sure what you are feeding to STT is good audio.
0 讨论(0)
发布评论:

提交评论
- 加载中...