Python 3 CSV file giving UnicodeDecodeError: 'utf-8' codec can't decode byte error when I print

前端 未结 7 569
梦如初夏
梦如初夏 2020-12-03 01:18

I have the following code in Python 3, which is meant to print out each line in a csv file.

import csv
with open(\'my_file.csv\', \'r\', newline=\'\') as csv         


        
相关标签:
7条回答
  • 2020-12-03 01:22

    easy... just open it in Excel or OpenOffice calc, use text as columns, select ,, and then just save the file as .csv... it takes me one day and several hour of search in google... but at the end i figure it out.

    0 讨论(0)
  • 2020-12-03 01:26

    We know the file contains the byte b'\x96' since it is mentioned in the error message:

    UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 7386: invalid start byte
    

    Now we can write a little script to find out if there are any encodings where b'\x96' decodes to ñ:

    import pkgutil
    import encodings
    import os
    
    def all_encodings():
        modnames = set([modname for importer, modname, ispkg in pkgutil.walk_packages(
            path=[os.path.dirname(encodings.__file__)], prefix='')])
        aliases = set(encodings.aliases.aliases.values())
        return modnames.union(aliases)
    
    text = b'\x96'
    for enc in all_encodings():
        try:
            msg = text.decode(enc)
        except Exception:
            continue
        if msg == 'ñ':
            print('Decoding {t} with {enc} is {m}'.format(t=text, enc=enc, m=msg))
    

    which yields

    Decoding b'\x96' with mac_roman is ñ
    Decoding b'\x96' with mac_farsi is ñ
    Decoding b'\x96' with mac_croatian is ñ
    Decoding b'\x96' with mac_arabic is ñ
    Decoding b'\x96' with mac_romanian is ñ
    Decoding b'\x96' with mac_iceland is ñ
    Decoding b'\x96' with mac_turkish is ñ
    

    Therefore, try changing

    with open('my_file.csv', 'r', newline='') as csvfile:
    

    to one of those encodings, such as:

    with open('my_file.csv', 'r', encoding='mac_roman', newline='') as csvfile:
    
    0 讨论(0)
  • 2020-12-03 01:27
    with open('my_file.csv', 'r', newline='', encoding='utf-8') as csvfile:
    

    Try opening the file like above

    0 讨论(0)
  • 2020-12-03 01:29

    A much simpler solution is to open the csv file in notepad and select "Save As" in "File" dropdown list. Choose "Save as type" to "All files(.)". Select "UTF-8 Encoding" in Encoding dropdown list and put ".csv" extension to the file name

    0 讨论(0)
  • 2020-12-03 01:33

    with open('my_file.csv', 'r', newline='', encoding='ISO-8859-1') as csvfile:

    ñ character is not listed on UTC-8 encoding. To fix the issue, you may use ISO-8859-1 encoding instead. For more details about this encoding, you may refer to the link below: https://www.ic.unicamp.br/~stolfi/EXPORT/www/ISO-8859-1-Encoding.html

    0 讨论(0)
  • 2020-12-03 01:35

    I also faced the issue with python 3 and my issue got resolved using the encoding type as utf-16

    with open('data.csv', newline='',encoding='utf-16') as csvfile:
    
    0 讨论(0)
提交回复
热议问题