How to increase listen time in google speech api?

半城伤御伤魂 提交于 2020-01-24 11:49:05

问题


I have made a working speech to text program using the google speech to text api that records speech and copies it into a .txt however, the Google speech api does not listen for very long (approx 9 seconds) is there any way to increase this, or a better api for use in python that can write while listening?

import time
import speech_recognition as sr
import sys
import fileinput
r=sr.Recognizer()
#tells the program to use a mic and to listen
with sr.Microphone() as source:
    audio=r.listen(source)
#asking the program to try to listen
try:
    spoken = r.recognize_google(audio)

    print("I heard:"+spoken)

except Exception:
    print ("Somthing went wrong")
#writing what was recorded by the mic into a .txt
with open("name-of-file.txt", "a") as f:
    f.write("\n")
    f.write(time.strftime("%H:%M:%S") + " " + time.strftime("%d/%m/%Y"))
    f.write("\n")
    f.write(spoken)

Expected result: The program listens and writes at the same time or The program can listen until turned off. Actual result: The program listens for about 9 seconds and then prints to .txt


回答1:


Speech recognition is a pretty good library, but I too have had to fight with recording lengths. Here's how I've managed around the problem:

Saving Audio to Disk

with sr.AudioFile('path/to/audiofile.wav') as source:
    audio = r.record(source)

Pros: Recording to an audio file and then sending longer chunks to google has given me more consistent recording lengths, compared to streaming.

Cons: Depending on the size of the audio file, this could present the disadvantage of lengthening the response time to a couple seconds, which might be unusable in your case.

Minimizing Noise Floor

You're likely already very aware that a better signal-noise ratio will give better STT accuracy - but i've also found it critical for the good chunk sizes with the speech recognition library.

Double check that your noise floor is easily distinguishable from your source. Recording the audio also help you troubleshoot this. Sometimes the audio can cutoff prematurely using the speech recognition library because it doesn't clearly detect you are speaking.

If improving the quality or proximity of your microphone isn't possible, there is a tool included in the library which calibrates audio levels for optimal signal-noise distinction.

To activate this feature, instead of the line:

audio=r.listen(source)

Try using:

audio=r.adjust_for_ambient_noise(source)

Be aware that this feature adds a small amount of latency in some cases. In others, it will continue listening indefinitely if you feed it noisy audio.

Combining it All

with sr.AudioFile('path/to/audiofile.wav') as source:
    audio = r.adjust_for_ambient_noise(source)

Here's a great guide for this library - The Ultimate Guide To Speech Recognition With Python



来源:https://stackoverflow.com/questions/54097204/how-to-increase-listen-time-in-google-speech-api

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!