问题

I have made a working speech to text program using the google speech to text api that records speech and copies it into a .txt however, the Google speech api does not listen for very long (approx 9 seconds) is there any way to increase this, or a better api for use in python that can write while listening?

import time
import speech_recognition as sr
import sys
import fileinput
r=sr.Recognizer()
#tells the program to use a mic and to listen
with sr.Microphone() as source:
    audio=r.listen(source)
#asking the program to try to listen
try:
    spoken = r.recognize_google(audio)

    print("I heard:"+spoken)

except Exception:
    print ("Somthing went wrong")
#writing what was recorded by the mic into a .txt
with open("name-of-file.txt", "a") as f:
    f.write("\n")
    f.write(time.strftime("%H:%M:%S") + " " + time.strftime("%d/%m/%Y"))
    f.write("\n")
    f.write(spoken)

Expected result: The program listens and writes at the same time or The program can listen until turned off. Actual result: The program listens for about 9 seconds and then prints to .txt

回答1:

Speech recognition is a pretty good library, but I too have had to fight with recording lengths. Here's how I've managed around the problem:

Saving Audio to Disk

with sr.AudioFile('path/to/audiofile.wav') as source:
    audio = r.record(source)

Pros: Recording to an audio file and then sending longer chunks to google has given me more consistent recording lengths, compared to streaming.

Cons: Depending on the size of the audio file, this could present the disadvantage of lengthening the response time to a couple seconds, which might be unusable in your case.

Minimizing Noise Floor

You're likely already very aware that a better signal-noise ratio will give better STT accuracy - but i've also found it critical for the good chunk sizes with the speech recognition library.

Double check that your noise floor is easily distinguishable from your source. Recording the audio also help you troubleshoot this. Sometimes the audio can cutoff prematurely using the speech recognition library because it doesn't clearly detect you are speaking.

If improving the quality or proximity of your microphone isn't possible, there is a tool included in the library which calibrates audio levels for optimal signal-noise distinction.

To activate this feature, instead of the line:

audio=r.listen(source)

Try using:

audio=r.adjust_for_ambient_noise(source)

Be aware that this feature adds a small amount of latency in some cases. In others, it will continue listening indefinitely if you feed it noisy audio.

Combining it All

with sr.AudioFile('path/to/audiofile.wav') as source:
    audio = r.adjust_for_ambient_noise(source)

Here's a great guide for this library - The Ultimate Guide To Speech Recognition With Python

来源：https://stackoverflow.com/questions/54097204/how-to-increase-listen-time-in-google-speech-api

标签

python

google-speech-api

How to increase listen time in google speech api?

问题

回答1:

Saving Audio to Disk

Minimizing Noise Floor

Combining it All