问题
I have a csv (tsv) file with this header
"Message Name" "Field" "Base Label" "Base Label Update Date" "Translated Label" "Translated Label Update Date" "Language"
"Message" "subject_template" "New Task: Assess Distribution Outcomes for ""${docNameNoLink}"", ""${docNumber}""" "8/10/16 4:17:43 PM" "Nouvelle tâche : évaluez le résultat de la distribution de « ${docNameNoLink} »." "2/17/14 5:09:10 AM" "fr"
When I try to read the file with this code
import csv
with open(fileName, 'r', encoding='utf-8', errors='replace') as fdata:
csv.register_dialect('tsv', delimiter='\t', quoting=csv.QUOTE_NONE)
reader=csv.reader(fdata, dialect='tsv')
try:
for row in reader:
print (row)
except csv.Error as e:
sys.exit('file{}, line {}: {}'.format(fileName, reader.line_num, e))
I get the message error: file NameFile, line 1: line contains NULL byte
However, if I run this code without the part of errors='replace|ignore', same code:
with open(fileName, 'r', encoding='utf-8') as fdata:
csv.register_dialect('tsv', delimiter='\t', quoting=csv.QUOTE_NONE)
reader=csv.reader(fdata, dialect='tsv')
try:
for row in reader:
print (row)
except csv.Error as e:
sys.exit('file {}, line {}: {}'.format(fileName, reader.line_num, e))
I got the following message error:
File "csvFiles.py", line 76 in <module>
for row in reader:
File "c:\Python35\lib\codecs.py", line 321 in decode (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
What is the possible reason of this error and how can I can correct it and make the script work?
回答1:
Your data is not encoded in 'utf-8' but in 'utf-16-le' or something similar. 'utf-16-le' is just a guess. When I encode your data with 'utf-16-le' exactly the same errors are produced. Check the encoding of your data file. In Linux you can use an editor like emacs for that or the 'file' utility.
The error message itself tells us that the first byte of your file is 0xff. This is, potentially, part of the Byte-Order Mark.
回答2:
If you just make one change in the code line than it might get work
with open(fileName, 'r', encoding='utf-16') as fdata:
回答3:
For some reason, python does not like a single backslash. Try it again but replace all of your single backslashes with two. Goodluck.
来源:https://stackoverflow.com/questions/41725308/python-3-csv-files-and-unicode-error