Unicode Encode Error 'latin-1' codec can't encode character '\u2019'

人走茶凉 提交于 2020-11-30 00:24:24

问题


I am trying to create a CSV of data from a MySQL RDB to move it over to Amazon Redshift. However, one of the fields contains descriptions and some of those descriptions contain the '’' character, or the right single quotation mark. before when I would run the code, it would give me

UnicodeEncodeError: 'charmap' codec can't encode character '\x92' in position 62: character maps to <undefined>

I then tried using REPLACE to attempt to get rid of the right single quotation marks.

db = pymysql.connect(host='host', port=3306, user="user", passwd="password", db="db", autocommit=True)
cur = db.cursor()
#cur.execute("call inv1_view_prod.`Email_agg`")

cur.execute("""select field_1, 
        field_2, 
        field_3, 
        field_4, 
        replace(field_4_desc,"’","") field_4_desc, 
        field_5, 
        field_6, 
        field_7 
from table_name """) 


emails = cur.fetchall()
with open('O:\file\path\to\file_name.csv','w') as fileout:
        writer = csv.writer(fileout)
        writer.writerows(emails)   
time.sleep(1)

However, this gave me the error:

UnicodeEncodeError: 'latin-1' codec can't encode character '\u2019' in position 132: ordinal not in range(256)

And I noticed 132 is the position of the right single quotation mark in the SQL statement so I beieve the code itself may be having an issue with it. I tried using the regular straight apostrophe instead of the right single quotation mark in the REPLACE statement, however this did not replace the character and still came back with the original error. Does anyone know why it won't accept the single quote and how to fix it?


回答1:


\u2019 is Unicode for , UTF-8 hex E28099, which is a "RIGHT SINGLE QUOTATION MARK". The direct latin1 equivalent is hex 92. Some word processing products use that instead of apostrophe (').

You are getting the error messages, not because you can't handle the character, but because the configuration fails to declare which encoding is used where.

"132" seems irrelevant: 132 84 E2809E „ &#x84;

Notes on Python: http://mysql.rjweb.org/doc.php/charcoll#python
Notes on other charset issues: Trouble with UTF-8 characters; what I see is not what I stored

Without knowing the schema or the Python configuration, I can't be more specific.



来源:https://stackoverflow.com/questions/51157481/unicode-encode-error-latin-1-codec-cant-encode-character-u2019

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!