Is there any way to determine a string\'s encoding in C#?
Say, I have a filename string, but I don\'t know if it is encoded in Unicode UTF-16 or the
I found new library on GitHub: CharsetDetector/UTF-unknown
Charset detector build in C# - .NET Core 2-3, .NET standard 1-2 & .NET 4+
it's also a port of the Mozilla Universal Charset Detector based on other repositories.
CharsetDetector/UTF-unknown have a class named CharsetDetector.
CharsetDetector contains some static encoding detect methods:
CharsetDetector.DetectFromFile()CharsetDetector.DetectFromStream()CharsetDetector.DetectFromBytes()detected result is in class DetectionResult has attribute Detected which is instance of class DetectionDetail with below attributes:
EncodingNameEncodingConfidencebelow is an example to show usage:
// Program.cs
using System;
using System.Text;
using UtfUnknown;
namespace ConsoleExample
{
public class Program
{
public static void Main(string[] args)
{
string filename = @"E:\new-file.txt";
DetectDemo(filename);
}
///
/// Command line example: detect the encoding of the given file.
///
/// a filename
public static void DetectDemo(string filename)
{
// Detect from File
DetectionResult result = CharsetDetector.DetectFromFile(filename);
// Get the best Detection
DetectionDetail resultDetected = result.Detected;
// detected result may be null.
if (resultDetected != null)
{
// Get the alias of the found encoding
string encodingName = resultDetected.EncodingName;
// Get the System.Text.Encoding of the found encoding (can be null if not available)
Encoding encoding = resultDetected.Encoding;
// Get the confidence of the found encoding (between 0 and 1)
float confidence = resultDetected.Confidence;
if (encoding != null)
{
Console.WriteLine($"Detection completed: {filename}");
Console.WriteLine($"EncodingWebName: {encoding.WebName}{Environment.NewLine}Confidence: {confidence}");
}
else
{
Console.WriteLine($"Detection completed: {filename}");
Console.WriteLine($"(Encoding is null){Environment.NewLine}EncodingName: {encodingName}{Environment.NewLine}Confidence: {confidence}");
}
}
else
{
Console.WriteLine($"Detection failed: {filename}");
}
}
}
}
example result screenshot: