Again: UnicodeEncodeError: ascii codec can't encode

一曲冷凌霜 提交于 2019-12-06 01:38:41

You are parsing XML; the XML API hands you unicode values. You are then attempting to write the unicode data to a CSV file without encoding it first. Python then attempts to encode it for you but fails. You can see this in your traceback, it is the .writerows() call that fails, and the error tells you that encoding is failing, and not decoding (parsing the XML).

You need to choose an encoding, then encode your data before writing:

for elem in tree.iter():
    if elem.tag == "AGENCY_CODE":
        agencycodes.append(int(elem.text))
    elif elem.tag == "RIN":
        rins.append(elem.text.encode('utf8'))
    elif elem.tag == "TITLE":
        titles.append(elem.text.encode('utf8'))

I used the UTF8 encoding because it can handle any Unicode code point, but you need to make your own, explicit choice.

It sounds like you have a unicode character somewhere in your xml file. Unicode is different than a string that is encoded utf8.

The python2.7 csv library doesn't support unicode characters so you'll have to run the data through a function that encodes them before you dump them into your csv file.

def normalize(s):
    if type(s) == unicode: 
        return s.encode('utf8', 'ignore')
    else:
        return str(s)

so your code would look like this:

for elem in tree.iter():
    if elem.tag == "AGENCY_CODE":
        agencycodes.append(int(elem.text))
    elif elem.tag == "RIN":
        rins.append(normalize(elem.text))
    elif elem.tag == "TITLE":
        titles.append(normalize(elem.text))
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!