发表新帖

发表新帖

Encoding.UTF8.GetString doesn't take into account the Preamble/BOM

后端未结

关注

 4  763

感情败类 2020-11-28 13:26

In .NET, I\'m trying to use Encoding.UTF8.GetString method, which takes a byte array and converts it to a string.

It looks like this method

4条回答

轻奢々 (楼主)

2020-11-28 13:55

It looks like this method ignores the BOM (Byte Order Mark), which might be a part of a legitimate binary representation of a UTF8 string, and takes it as a character.

It doesn't look like it "ignores" it at all - it faithfully converts it to the BOM character. That's what it is, after all.

If you want to make your code ignore the BOM in any string it converts, that's up to you to do... or use StreamReader.

Note that if you either use Encoding.GetBytes followed by Encoding.GetString or use StreamWriter followed by StreamReader, both forms will either produce then swallow or not produce the BOM. It's only when you mix using a StreamWriter (which uses Encoding.GetPreamble) with a direct Encoding.GetString call that you end up with the "extra" character.

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

热议问题