问题
I am trying to read in a document containing product data and print certain product's data out. Problem is, I can't seem to get it read in without error. I am just trying to print the first 100 characters just to get it read in so I can then figure out what specifically I need to print and how to pull it out of the file. But I am stuck reading it in. The document is in UTF-8, or it should be... what am I missing?
Here is my code:
products = open('products.csv')
productsread = products.read()
print(productsread[:100])
And here is the Traceback I get:
Traceback (most recent call last):
File "nilescratchpad.py", line 2, in <module>
productsread = products.read()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 7451: invalid continuation byte
回答1:
If you read the document and it throws an error with the UTF-8 codec, then it isn't UTF-8, or at least has errors in it. open('products.csv',encoding='utf8',errors='replace')
will replace all errors with the Unicode codepoint U+FFFD REPLACEMENT CHARACTER
, but make sure most of your document actually is UTF-8.
来源:https://stackoverflow.com/questions/46623798/unicodedecodeerror-invalid-continuation-byte-when-trying-to-read-in-document