UnicodeEncodeError with xlrd

问题

I'm trying to read a .xlsx with xlrd. I have everything set up and working. It works for data with normal English letters as well as numbers. However when it gets to Swedish letters (ÄÖÅ) it gives me this error:

print str(sheet.cell_value(1, 2)) + " " + str(sheet.cell_value(1, 3)) + " " + str(sheet.cell_value(1, 4)) + " " + str(sheet.cell_value(1, 5))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xd6' in position 1: ordinal not in range(128)

My code:

# -*- coding: cp1252 -*-
import xlrd

file_location = "test.xlsx"

workbook = xlrd.open_workbook(file_location)
sheet = workbook.sheet_by_index(0)

print str(sheet.cell_value(1, 2)) + " " + str(sheet.cell_value(1, 3)) + " " + str(sheet.cell_value(1, 4)) + " " + str(sheet.cell_value(1, 5))

I've even tried:

workbook = xlrd.open_workbook("test.xlsx", encoding_override="utf-8")

as well as:

workbook = xlrd.open_workbook("test.xlsx", encoding="utf-8")

Edit: I'm running Python 2.7 on a Windows 7 64-bit computer.

回答1:

'ascii' codec can't encode

The problem here isn't the decode when reading the file, it is the encode necessary to print. Your environment is using ASCII for sys.stdout, and so when you try to print any Unicode characters that can't be encoded in ASCII you'll receive that error.

Documentation reference:

The character encoding is platform-dependent. Under Windows, if the stream is interactive (that is, if its isatty() method returns True), the console codepage is used, otherwise the ANSI code page. Under other platforms, the locale encoding is used (see locale.getpreferredencoding()).

Under all platforms though, you can override this value by setting the PYTHONIOENCODING environment variable before starting Python.

回答2:

Try to use utf-8 as @Anand S Kumar suggested and decode strings before printing.

# -*- coding: utf-8 -*-
import xlrd

file_location = "test.xlsx"

workbook = xlrd.open_workbook(file_location)
sheet = workbook.sheet_by_index(0)

cells = [sheet.cell_value(1, i).decode('utf-8') for i in range(2, 6)]
print ' '.join(cells)

回答3:

xlrd by default uses Unicode encoding. If xlrd is not able to recognize the encoding then it will consider that the encoding used in the excel file is ASCII, character encoding. Finally if the encoding is not ASCII or if python is not able to convert the data to Unicode then it will raise a UnicodeDecodeError.

Don't worry we have a solution for this kind of problems. It seems that you are using cp1252. So while you will be opening the file using open_workbook(), you can call it as follows:

>>> book = xlrd.open_workbook(filename='filename',encoding_override="cp1252")

When you will use the above function xlrd will decode the respective encoding and you will be good to go.
Source(s):

Standard Encodings.
xlrd official documentation
UnicodeDecodeError

来源：https://stackoverflow.com/questions/31661769/unicodeencodeerror-with-xlrd

标签

python

unicode

xlrd

unicode-string