问题
I am trying to create a method that can detect the encoding schema of a text file. I know there are many out there, but I know for sure my text file with be either ASCII, UTF-8, or UTF-16. I only need to detect these three. Anyone know a way to do this?
回答1:
Use the StreamReader to identify the encoding.
Example:
using(var r = new StreamReader(filename, Encoding.Default))
{
richtextBox1.Text = r.ReadToEnd();
var encoding = r.CurrentEncoding;
}
回答2:
First, open the file in binary mode and read it into memory.
For UTF-8 (or ASCII), do a validation check. You can decode the text using Encoding.GetEncoding("UTF-8", EncoderFallback.ExceptionFallback, DecoderFallback.ExceptionFallback).GetString(bytes) and catch the exception. If you don't get one, the data is valid UTF-8. Here is the code:
private bool detectUTF8Encoding(string filename)
{
byte[] bytes = File.ReadAllBytes(filename);
try {
Encoding.GetEncoding("UTF-8", EncoderFallback.ExceptionFallback, DecoderFallback.ExceptionFallback).GetString(bytes);
return true;
} catch {
return false;
}
}
For UTF-16, check for the BOM (FE FF or FF FE, depending on byte order).
来源:https://stackoverflow.com/questions/10522570/determining-text-file-encoding-schema