Determining text file encoding schema

僤鯓⒐⒋嵵緔 提交于 2021-02-08 05:03:35

问题


I am trying to create a method that can detect the encoding schema of a text file. I know there are many out there, but I know for sure my text file with be either ASCII, UTF-8, or UTF-16. I only need to detect these three. Anyone know a way to do this?


回答1:


Use the StreamReader to identify the encoding.

Example:

using(var r = new StreamReader(filename, Encoding.Default))
{
    richtextBox1.Text = r.ReadToEnd();
    var encoding = r.CurrentEncoding;
}



回答2:


First, open the file in binary mode and read it into memory.

For UTF-8 (or ASCII), do a validation check. You can decode the text using Encoding.GetEncoding("UTF-8", EncoderFallback.ExceptionFallback, DecoderFallback.ExceptionFallback).GetString(bytes) and catch the exception. If you don't get one, the data is valid UTF-8. Here is the code:

private bool detectUTF8Encoding(string filename)
{
    byte[] bytes = File.ReadAllBytes(filename);
    try {
        Encoding.GetEncoding("UTF-8", EncoderFallback.ExceptionFallback, DecoderFallback.ExceptionFallback).GetString(bytes);
        return true;
    } catch {
        return false;
    }
}

For UTF-16, check for the BOM (FE FF or FF FE, depending on byte order).



来源:https://stackoverflow.com/questions/10522570/determining-text-file-encoding-schema

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!