Pytesseract: UnicodeDecodeError: 'charmap' codec can't decode byte

早过忘川 提交于 2019-11-28 09:14:30

问题


I'm running a large number of OCRs on screenshots with Pytesseract. This is working well in most cases, but a small number is causing this error:

pytesseract.image_to_string(image,None, False, "-psm 6")
Pytesseract: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2: character maps to <undefined>

I'm using Python 3.4. Any suggestions how I can prevent this error from happening (other than just a try/except) would be very helpful.


回答1:


Use Unidecode

from unidecode import unidecode
import pytesseract

strs = pytesseract.image_to_string(Image.open('binarized_image.png'))
strs = unidecode(strs)
print (strs)



回答2:


make sure you are using the right decoding options.
see https://docs.python.org/3/library/codecs.html#standard-encodings

str.decode('utf-8')
bytes.decode('cp950') for Traditional Chinese, etc



来源:https://stackoverflow.com/questions/32927631/pytesseract-unicodedecodeerror-charmap-codec-cant-decode-byte

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!