Python SpeechRecognition word by word? continuous output?

问题

I was wondering whether there is a way to output words as soon as possible. For example if I say "hello world" it should output:

hello 
world

Currently I'm using this code

import speech_recognition as sr

r = sr.Recognizer()

with sr.Microphone() as source:
    while True:
        r.pause_threshold=0.1 ##i tried playing with these 3 but no luck
        r.phrase_threshold=0.5
        r.non_speaking_duration=0.1
        audio = r.listen(source)
        try:
            text = r.recognize_google(audio)
            print(text)
        except Exception as e:
            print("-")

What this does is that it records until the mic doesn't hear anything and then outputs everything that it heard in one line, I want to see what has been said as quickly as possible.

回答1:

There are streaming libraries that do that. One is Google's speech API python client. Another is https://github.com/alphacep/vosk-api. The Python code should look like this, it returns immediately as you speak.

from vosk import Model, KaldiRecognizer
import os

if not os.path.exists("model-en"):
    print ("Please download the model from https://github.com/alphacep/vosk-android-demo/releases and unpack as 'model-en' in the current folder.")
    exit (1)

import pyaudio

p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=8000)
stream.start_stream()

model = Model("model-en")
rec = KaldiRecognizer(model, 16000)

while True:
    data = stream.read(2000)
    if len(data) == 0:
        break
    if rec.AcceptWaveform(data):
        print(rec.Result())
    else:
        print(rec.PartialResult())

print(rec.FinalResult())

来源：https://stackoverflow.com/questions/60684279/python-speechrecognition-word-by-word-continuous-output

标签

python

speech-recognition

speech-to-text