Converting special charactes such as Ã¼ and Ãƒ back to their original, latin alphbet counterparts in C#

后端未结

关注

 5  923

感情败类 2020-12-30 02:02

I have been given an export from a MySQL database that seems to have had it\'s encoding muddled somewhat over time and contains a mix of HTML char codes such as

5条回答

佛祖请我去吃肉 (楼主)

2020-12-30 02:12
Well, first of all, as the data has been decoded using the wrong encoding, it's likely that some of the characters are impossible to recover. It looks like it's UTF-8 data that incorrectly decoded using an 8-bit encoding.

There is no built in method to recover data like this, because it's not something that you normally do. There is no reliable way to decode the data, because it's already broken.

What you can try, is to encode the data, and decode it using the wrong encoding again, just the other way around:
```
byte[] data = Encoding.Default.GetBytes(input);
string output = Encoding.UTF8.GetString(data);
```
The Encoding.Default uses the current ANSI encoding for your system. You can try some different encodings there and see which one gives the best result.
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...