Is there any way to determine a string\'s encoding in C#?
Say, I have a filename string, but I don\'t know if it is encoded in Unicode UTF-16 or the
I know this is a bit late - but to be clear:
A string doesn't really have encoding... in .NET the a string is a collection of char objects. Essentially, if it is a string, it has already been decoded.
However if you are reading the contents of a file, which is made of bytes, and wish to convert that to a string, then the file's encoding must be used.
.NET includes encoding and decoding classes for: ASCII, UTF7, UTF8, UTF32 and more.
Most of these encodings contain certain byte-order marks that can be used to distinguish which encoding type was used.
The .NET class System.IO.StreamReader is able to determine the encoding used within a stream, by reading those byte-order marks;
Here is an example:
///
/// return the detected encoding and the contents of the file.
///
///
///
///
public static Encoding DetectEncoding(String fileName, out String contents)
{
// open the file with the stream-reader:
using (StreamReader reader = new StreamReader(fileName, true))
{
// read the contents of the file into a string
contents = reader.ReadToEnd();
// return the encoding.
return reader.CurrentEncoding;
}
}