UnicodeEncodeError with xlrd

纵饮孤独 提交于 2019-12-11 04:34:45

问题


I'm trying to read a .xlsx with xlrd. I have everything set up and working. It works for data with normal English letters as well as numbers. However when it gets to Swedish letters (ÄÖÅ) it gives me this error:

print str(sheet.cell_value(1, 2)) + " " + str(sheet.cell_value(1, 3)) + " " + str(sheet.cell_value(1, 4)) + " " + str(sheet.cell_value(1, 5))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xd6' in position 1: ordinal not in range(128)

My code:

# -*- coding: cp1252 -*-
import xlrd

file_location = "test.xlsx"

workbook = xlrd.open_workbook(file_location)
sheet = workbook.sheet_by_index(0)

print str(sheet.cell_value(1, 2)) + " " + str(sheet.cell_value(1, 3)) + " " + str(sheet.cell_value(1, 4)) + " " + str(sheet.cell_value(1, 5))

I've even tried:

workbook = xlrd.open_workbook("test.xlsx", encoding_override="utf-8")

as well as:

workbook = xlrd.open_workbook("test.xlsx", encoding="utf-8")

Edit: I'm running Python 2.7 on a Windows 7 64-bit computer.


回答1:


'ascii' codec can't encode

The problem here isn't the decode when reading the file, it is the encode necessary to print. Your environment is using ASCII for sys.stdout, and so when you try to print any Unicode characters that can't be encoded in ASCII you'll receive that error.

Documentation reference:

The character encoding is platform-dependent. Under Windows, if the stream is interactive (that is, if its isatty() method returns True), the console codepage is used, otherwise the ANSI code page. Under other platforms, the locale encoding is used (see locale.getpreferredencoding()).

Under all platforms though, you can override this value by setting the PYTHONIOENCODING environment variable before starting Python.




回答2:


Try to use utf-8 as @Anand S Kumar suggested and decode strings before printing.

# -*- coding: utf-8 -*-
import xlrd

file_location = "test.xlsx"

workbook = xlrd.open_workbook(file_location)
sheet = workbook.sheet_by_index(0)

cells = [sheet.cell_value(1, i).decode('utf-8') for i in range(2, 6)]
print ' '.join(cells)



回答3:


xlrd by default uses Unicode encoding. If xlrd is not able to recognize the encoding then it will consider that the encoding used in the excel file is ASCII, character encoding. Finally if the encoding is not ASCII or if python is not able to convert the data to Unicode then it will raise a UnicodeDecodeError.

Don't worry we have a solution for this kind of problems. It seems that you are using cp1252. So while you will be opening the file using open_workbook(), you can call it as follows:

>>> book = xlrd.open_workbook(filename='filename',encoding_override="cp1252")

When you will use the above function xlrd will decode the respective encoding and you will be good to go.
Source(s):

  1. Standard Encodings.
  2. xlrd official documentation
  3. UnicodeDecodeError


来源:https://stackoverflow.com/questions/31661769/unicodeencodeerror-with-xlrd

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!