Avoiding code change with Microsoft SQLServer and Unicode

こ雲淡風輕ζ 提交于 2020-01-16 03:48:06

问题


How can you get MSSQL server to accept Unicode data by default into a VARCHAR or NVARCHAR column?

I know that you can do it by placing a N in front of the string to be placed in the field but to by quite honest this seems a bit archaic in 2008 and particuarily with using SQL Server 2005.


回答1:


The N syntax is how you specify a unicode string literal in SQL Server.

N'Unicode string'
'ANSI string'

SQL Server will auto convert between the two when possible, using either a column's collation or the database's collation.

So if your string literals don't actually contain unicode characters, you do not need to specify the N prefix.

But if your string literals do contain unicode characters then using the N prefix is necessary.




回答2:


If this is a web application, you could probably get your webserver to use UTF8 as it's default encoding. That way all data back and forth to the browser would be UTF8 which can be inserted into VARCHAR fields. UTF8 is a nice way to make applications that are not aware of Unicode deal with it.




回答3:


They really need a way to turn off the need for the N'' prefix. The "it's needed for backwards compatibility" argument makes zero sense to me - sure, make that behavior the default for old apps, but provide an option for me to turn on Unicode strings by default (i.e, no N'' prefix required.) I'm discovering that I need to go and mess with large areas of my app to adapt to Unicode on SQL Server when this is NOT an issue in Oracle and Postgresql. C'mon, Microsoft!




回答4:


While you can simply store UTF8 content in a VARCHAR field in MSSQL Server as long as charset translation is not done you should be aware that:

  1. No management/reporting/data tools outside of your application will be able to understand your non-english characters.

  2. Language specific handling such as sorting a list of names may not be done in the order acceptable for every language.

  3. Must be careful about data truncation. Truncating a multi-byte UTF8 character ususally causes data corruption for the character involved. You should always reject input if it exceeds the field length.

  4. It may not be as easy as you think to disable charset translation..Even if you turn it off in your client driver it can still be overriden in some cases if there is a significant locale difference between client and RDBMS codepage used which instantly leads to data corruption.

  5. If you think this is all you will have to worry about your fooling yourself.

In summary while you might be tempted to go down this path its not a good idea. Code change is required when going multi-byte.



来源:https://stackoverflow.com/questions/167152/avoiding-code-change-with-microsoft-sqlserver-and-unicode

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!