UnicodeDecodeError from sound file

瘦欲@ 提交于 2019-12-24 16:34:09

问题


I'm trying to make a speech recogniser in Python using Google speech API. I've been using and adapting the code from here (converted to Python3). I'm using an audio file on my computer that's been converted from mp3 to flac 16000 Hz (as specified in the original code) using an online converter. When running the code I get this error:

$ python3 speech_api.py 02-29-2016_00-12_msg1.flac 
Traceback (most recent call last):
  File "speech_api.py", line 12, in <module>
    data = f.read()
  File "/usr/lib/python3.4/codecs.py", line 319, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 9: invalid start byte

This is my code. (I'm sure there are also still things that don't work in Python3, as I've been trying to adapt it and am new to urllib...)

#!/usr/bin/python
import sys
from urllib.request import urlopen
import json
try:
    filename = sys.argv[1]
except IndexError:
    print('Usage: transcribe.py <file>')
    sys.exit(1)

with open(filename) as f:
    data = f.read()

req = urllib.request('https://www.google.com/intl/en/chrome/demos/speech.html', data=data, headers={'Content-type': 'audio/x-flac; rate=16000'})

try:
    ret = urllib.urlopen(req)
except urllib.URLError:
    print("Error Transcribing Voicemail")
    sys.exit(1)

resp = ret.read()
text = json.loads(resp)['hypotheses'][0]['utterance']
print(text)

Any ideas what I could do?


回答1:


You need to open the file in binary mode:

open(filename, 'wb')

Note the 'b', or the file will be treated as text and decoded to Unicode.



来源:https://stackoverflow.com/questions/35724820/unicodedecodeerror-from-sound-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!