Is there any way to determine a string\'s encoding in C#?
Say, I have a filename string, but I don\'t know if it is encoded in Unicode UTF-16 or the
My solution is to use built-in stuffs with some fallbacks.
I picked the strategy from an answer to another similar question on stackoverflow but I can't find it now.
It checks the BOM first using the built-in logic in StreamReader, if there's BOM, the encoding will be something other than Encoding.Default, and we should trust that result.
If not, it checks whether the bytes sequence is valid UTF-8 sequence. if it is, it will guess UTF-8 as the encoding, and if not, again, the default ASCII encoding will be the result.
static Encoding getEncoding(string path) {
var stream = new FileStream(path, FileMode.Open);
var reader = new StreamReader(stream, Encoding.Default, true);
reader.Read();
if (reader.CurrentEncoding != Encoding.Default) {
reader.Close();
return reader.CurrentEncoding;
}
stream.Position = 0;
reader = new StreamReader(stream, new UTF8Encoding(false, true));
try {
reader.ReadToEnd();
reader.Close();
return Encoding.UTF8;
}
catch (Exception) {
reader.Close();
return Encoding.Default;
}
}