Determine a string's encoding in C#

前端 未结 9 2077
小鲜肉
小鲜肉 2020-11-22 14:54

Is there any way to determine a string\'s encoding in C#?

Say, I have a filename string, but I don\'t know if it is encoded in Unicode UTF-16 or the

9条回答
  •  我在风中等你
    2020-11-22 15:37

    I found new library on GitHub: CharsetDetector/UTF-unknown

    Charset detector build in C# - .NET Core 2-3, .NET standard 1-2 & .NET 4+

    it's also a port of the Mozilla Universal Charset Detector based on other repositories.

    CharsetDetector/UTF-unknown have a class named CharsetDetector.

    CharsetDetector contains some static encoding detect methods:

    • CharsetDetector.DetectFromFile()
    • CharsetDetector.DetectFromStream()
    • CharsetDetector.DetectFromBytes()

    detected result is in class DetectionResult has attribute Detected which is instance of class DetectionDetail with below attributes:

    • EncodingName
    • Encoding
    • Confidence

    below is an example to show usage:

    // Program.cs
    using System;
    using System.Text;
    using UtfUnknown;
    
    namespace ConsoleExample
    {
        public class Program
        {
            public static void Main(string[] args)
            {
                string filename = @"E:\new-file.txt";
                DetectDemo(filename);
            }
    
            /// 
            /// Command line example: detect the encoding of the given file.
            /// 
            /// a filename
            public static void DetectDemo(string filename)
            {
                // Detect from File
                DetectionResult result = CharsetDetector.DetectFromFile(filename);
                // Get the best Detection
                DetectionDetail resultDetected = result.Detected;
    
                // detected result may be null.
                if (resultDetected != null)
                {
                    // Get the alias of the found encoding
                    string encodingName = resultDetected.EncodingName;
                    // Get the System.Text.Encoding of the found encoding (can be null if not available)
                    Encoding encoding = resultDetected.Encoding;
                    // Get the confidence of the found encoding (between 0 and 1)
                    float confidence = resultDetected.Confidence;
    
                    if (encoding != null)
                    {
                        Console.WriteLine($"Detection completed: {filename}");
                        Console.WriteLine($"EncodingWebName: {encoding.WebName}{Environment.NewLine}Confidence: {confidence}");
                    }
                    else
                    {
                        Console.WriteLine($"Detection completed: {filename}");
                        Console.WriteLine($"(Encoding is null){Environment.NewLine}EncodingName: {encodingName}{Environment.NewLine}Confidence: {confidence}");
                    }
                }
                else
                {
                    Console.WriteLine($"Detection failed: {filename}");
                }
            }
        }
    }
    

    example result screenshot:

提交回复
热议问题