Python, speech_recognition tool does not recognize .wav file

我只是一个虾纸丫 提交于 2019-12-14 02:06:55

问题


I have generated a .wav audio file containing some speech with some other interference speech in the background. This code worked for me for a test .wav file:

    import speech_recognition as sr

    r = sr.Recognizer()
    with sr.WavFile(wav_path) as source:
        audio = r.record(source)

    text = r.recognize_google(audio)

If I use my .wav file, I get the following error:

ValueError: Audio file could not be read as PCM WAV, AIFF/AIFF-C, or Native FLAC; check if file is corrupted or in another format

The situation slightly improves if I save this .wav file with soundfile:

    import soundfile as sf        

    wav, samplerate = sf.read(wav_path)
    sf.write(saved_wav_path, original_wav, fs)

and then load the new saved_wav_path back into the first block of code, this time I get:

if not isinstance(actual_result, dict) or len(actual_result.get("alternative", [])) == 0: raise UnknownValueError()

The audio files were saved as

    wavfile.write(wav_path, fs, data)

where wav_path = 'data.wav'. Any ideas?

SOLUTION:

Saving the audio data the following way generates the correct .wav files:

    import wavio
    wavio.write(wav_path, data, fs ,sampwidth=2)

回答1:


From a brief look at the code in the speech_recognition package, it appears that it uses wave from the Python standard library to read WAV files. Python's wave library does not handle floating point WAV files, so you'll have to ensure that you use speech_recognition with files that were saved in an integer format.

SciPy's function scipy.io.wavfile.write will create an integer file if you pass it an array of integers. So if data is a floating point numpy array, you could try this:

from scipy.io import wavfile

# Convert `data` to 32 bit integers:
y = (np.iinfo(np.int32).max * (data/np.abs(data).max())).astype(np.int32)

wavfile.write(wav_path, fs, y)

Then try to read that file with speech_recognition.

Alternatively, you could use wavio (a small library that I created) to save your data to a WAV file. It also uses Python's wave library to create its output, so speech_recognition should be able to read the files that it creates.



来源:https://stackoverflow.com/questions/52249985/python-speech-recognition-tool-does-not-recognize-wav-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!