Visual Studio 2008 project file does not load because of an unexpected encoding change

最后都变了- 提交于 2019-12-05 05:45:57

I think I can provide some insight into what's happening, if not why.

FF FE is a BOM; its presence at the beginning of the file indicates that the file's encoding is UTF-16, little-endian. And it sounds like the original file really is UTF-16, but something is ignoring the BOM and reading it as if it were UTF-8.

When that happens, each of the bytes FF and FE is treated as invalid and converted to U+FFFD, the official Unicode garbage character. Then, when the text is written to a file again, each of the garbage characters gets converted to its UTF-8 encoding (EF BF BD) and the UTF-8 BOM (EF BB BF) is added in front of them, resulting in the nine-byte sequence you reported:

EF BB BF  # UTF-8 BOM
EF BF BD  # U+FFFD in UTF-8
EF BF BD  # ditto

If this is the case, simply replacing those nine bytes with FF FE is not safe. There's no guarantee those are the only bytes in the file that would be invalid when interpreted as UTF-8. As long as the file contains only ASCII characters you're okay, but anything else, like accented characters (é) or curly quotes (), will be irretrievably mangled.

Are the project files really supposed to be UTF-16? If not, maybe that one developer's system is generating UTF-16 when the version-control system is expecting UTF-8. I notice in my Visual C# Express install there's an option under Environment->Documents called "Save documents as Unicode when data cannot be saved in codepage". That sounds like something that could cause the encoding to change at apparently random times.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!