How to store UTF-8 bytes from a C# String in a SQL Server 2000 TEXT column

守給你的承諾、 提交于 2019-12-02 00:06:15

If your database collation is SQL_Latin1_General_CP1 (the default for the U.S. edition of SQL Server 2000), then you can use the following trick to store Unicode text as UTF-8 in a char, varchar, or text column:

byte[] bytes = Encoding.UTF8.GetBytes(Note.Note);
noteparam.Value = Encoding.GetEncoding(1252).GetString(bytes);

Later, when you want to read back the text, reverse the process:

SqlDataReader reader;
// ...
byte[] bytes = Encoding.GetEncoding(1252).GetBytes((string)reader["Note"]);
string note = Encoding.UTF8.GetString(bytes);

If your database collation is not SQL_Latin1_General_CP1, then you will need to replace 1252 with the correct code page.

Note: If you look at the stored text in Enterprise Manager or Query Analyzer, you'll see strange characters in place of non-ASCII text, just as if you opened a UTF-8 document in a text editor that didn't support Unicode.

How it works: When storing Unicode text in a non-Unicode column, SQL Server automatically converts the text from Unicode to the code page specified by the database collation. Any Unicode characters that don't exist in the target code page will be irreversibly mangled, which is why your first two methods didn't work.

But you were on the right track with method one. The missing step is to "protect" the raw UTF-8 bytes by converting them to Unicode using the Windows-1252 code page. Now, when SQL Server performs the automatic conversion from Unicode to Windows-1252, it gets back the original UTF-8 bytes untouched.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!