Converting problem ANSI to UTF8 C#

纵饮孤独 提交于 2019-12-01 16:27:16

Do you have any idea why is this happening?

Yes, you're too late. You need to specify ANSI when you read the string from file. In memory it's always Unicode (UTF16).

Andrey

When you convert to ASCII you immediately lose all non-English characters (including ones with accent) because ASCII has only 127 (7 bits) of characters.

You do strange manipulation. string in .net is in UTF-16, so once you return string, not byte[] this doesn't matter.

I think you should do: (I guess by ANSI you mean Latin1)

public byte[] Encode(string text)
{
    return Encoding.GetEncoding(1252).GetBytes(text);
}

Since the question was not very clear there is a reasonable remark that you might actually need this one:

public string Decode(byte[] data)
{
    return Encoding.GetEncoding(1252).GetString(data);
}

This is probably the easiest way:

byte[] ansiBytes = File.ReadAllBytes("inputfilename.txt");
var utf8String = Encoding.Default.GetString(ansiBytes);
File.WriteAllText("outputfilename.txt", utf8String);

I would recommend to read this http://www.joelonsoftware.com/articles/Unicode.html.
If you are going to read a ASCII file you need to know the code page of the file.

This is probably happening because your original string text already contains invalid characters. Encoding conversion only makes sense if your input is a byte array. So, you should read the file as byte array instead of string, or, as Henk said, specify the encoding for reading the file.

Lloyd

My thoughts here is when you save the file in Notepad++ it inserts the Byte-Order-Mark so the browser can infer that it's UTF8 from this. Otherwise you'd probably have to explicitly tell the browser the character encoding, as in the DTD, in XML etc.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!