Unicode Encode Error 'latin-1' codec can't encode character '\u2019'

问题

I am trying to create a CSV of data from a MySQL RDB to move it over to Amazon Redshift. However, one of the fields contains descriptions and some of those descriptions contain the '’' character, or the right single quotation mark. before when I would run the code, it would give me

UnicodeEncodeError: 'charmap' codec can't encode character '\x92' in position 62: character maps to <undefined>

I then tried using REPLACE to attempt to get rid of the right single quotation marks.

db = pymysql.connect(host='host', port=3306, user="user", passwd="password", db="db", autocommit=True)
cur = db.cursor()
#cur.execute("call inv1_view_prod.`Email_agg`")

cur.execute("""select field_1, 
        field_2, 
        field_3, 
        field_4, 
        replace(field_4_desc,"’","") field_4_desc, 
        field_5, 
        field_6, 
        field_7 
from table_name """) 


emails = cur.fetchall()
with open('O:\file\path\to\file_name.csv','w') as fileout:
        writer = csv.writer(fileout)
        writer.writerows(emails)   
time.sleep(1)

However, this gave me the error:

UnicodeEncodeError: 'latin-1' codec can't encode character '\u2019' in position 132: ordinal not in range(256)

And I noticed 132 is the position of the right single quotation mark in the SQL statement so I beieve the code itself may be having an issue with it. I tried using the regular straight apostrophe instead of the right single quotation mark in the REPLACE statement, however this did not replace the character and still came back with the original error. Does anyone know why it won't accept the single quote and how to fix it?

回答1:

\u2019 is Unicode for ’, UTF-8 hex E28099, which is a "RIGHT SINGLE QUOTATION MARK". The direct latin1 equivalent is hex 92. Some word processing products use that instead of apostrophe (').

You are getting the error messages, not because you can't handle the character, but because the configuration fails to declare which encoding is used where.

"132" seems irrelevant: 132 84 E2809E „ 

Notes on Python: http://mysql.rjweb.org/doc.php/charcoll#python
Notes on other charset issues: Trouble with UTF-8 characters; what I see is not what I stored

Without knowing the schema or the Python configuration, I can't be more specific.

来源：https://stackoverflow.com/questions/51157481/unicode-encode-error-latin-1-codec-cant-encode-character-u2019

标签

python

mysql

csv

unicode