write unicode data to mssql with python?

问题

I'm trying to write a table from a .csv file with Hebrew text in it to an sql server database.
the table is valid and pandas reads the data correct (even displays the hebrew properly in pycharm),
but when i try to write it to a table in the database i get question marks ("???") where the Hebrew should be.

this is what i've tried, using pandas and sqlalchemy:

import pandas as pd
from sqlalchemy import create_engine

engine = create_engine('mssql+pymssql://server/test?charset=utf8')
connection = engine.connect()

df = pd.read_csv("temp.csv", low_memory=False, encoding="UTF-8")
table_name = "test"
df.to_sql(table_name, connection, index=False, if_exists="append")

this loads the table properly but fails to write the Hebrew,
any suggestions?

回答1:

You need to change the datatype of your column. text is deprecated, and varchar(MAX) should be used instead, however, neither can store unicode characters. To store unicode characters you would need to use ntext, which is also deprecated; you need to use nvarchar(MAX).

To change your column's definition, you can use this pseudo-SQL (You'll need to replace the parts in braces ({}) with the appropriate object names):

ALTER TABLE {YourTable} ALTER COLUMN {YourColumn} nvarchar(MAX);

Edit: note, this will not restore any data lost in your column. Once a non-ANSI character is inserted into a varchar (or similar) datatype the data is immediately lost and cannot be recovered apart from by changing the datatype and reentry.

来源：https://stackoverflow.com/questions/53468205/write-unicode-data-to-mssql-with-python

标签

python

sql-server

pandas

utf-8

character-encoding