'Audio data must be audio data' error with google speech recognition in python

问题

I am trying to load an audio file in python and process it with google speech recognition

The problem is that unlike in C++, python doesn't show data types, classes, or give you access to memory to convert between one data type and another by creating a new object and repacking data

I dont understand how it's possible to convert from one data type to another in python

The code in question is below,

import speech_recognition as spr 
import librosa

audio, sr = librosa.load('sample_data/metal.mp3')

# create a speech recognition object 
r = spr.Recognizer() 

r.recognize_google(audio)

The error is:

audio_data must be audio data

How do I convert the audio object to be used in google speech recognition

回答1:

Librosa returns numpy array, you need to convert it back to wav. Something like this:

 raw_audio = np.int16(audio/np.max(np.abs(audio)) * 32767).tobytes()

You probably better load mp3 with ffmpeg wrapper without librosa things, librosa does strange things with the audio (normalizes, etc). Its better to work with raw data.

来源：https://stackoverflow.com/questions/60879469/audio-data-must-be-audio-data-error-with-google-speech-recognition-in-python

标签

python

windows

speech-recognition

google-speech-api

librosa

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!