How to use ReadAllText when file encoding unknown

前端 未结 2 740
轮回少年
轮回少年 2020-12-11 03:29

Im reading a file with ReadAllText

    String[] values = File.ReadAllText(@\"c:\\\\c\\\\file.txt\").Split(\';\');

    int i = 0;

    fore         


        
相关标签:
2条回答
  • 2020-12-11 04:16

    You have to check file encoding first. try this

    System.Text.Encoding enc = null; 
    System.IO.FileStream file = new System.IO.FileStream(filePath, 
        FileMode.Open, FileAccess.Read, FileShare.Read); 
    if (file.CanSeek) 
    { 
        byte[] bom = new byte[4]; // Get the byte-order mark, if there is one 
        file.Read(bom, 0, 4); 
        if ((bom[0] == 0xef && bom[1] == 0xbb && bom[2] == 0xbf) || // utf-8 
            (bom[0] == 0xff && bom[1] == 0xfe) || // ucs-2le, ucs-4le, and ucs-16le 
            (bom[0] == 0xfe && bom[1] == 0xff) || // utf-16 and ucs-2 
            (bom[0] == 0 && bom[1] == 0 && bom[2] == 0xfe && bom[3] == 0xff)) // ucs-4 
        { 
            enc = System.Text.Encoding.Unicode; 
        } 
        else 
        { 
            enc = System.Text.Encoding.ASCII; 
        } 
    
        // Now reposition the file cursor back to the start of the file 
        file.Seek(0, System.IO.SeekOrigin.Begin); 
    } 
    else 
    { 
        // The file cannot be randomly accessed, so you need to decide what to set the default to 
        // based on the data provided. If you're expecting data from a lot of older applications, 
        // default your encoding to Encoding.ASCII. If you're expecting data from a lot of newer 
        // applications, default your encoding to Encoding.Unicode. Also, since binary files are 
        // single byte-based, so you will want to use Encoding.ASCII, even though you'll probably 
        // never need to use the encoding then since the Encoding classes are really meant to get 
        // strings from the byte array that is the file. 
    
        enc = System.Text.Encoding.ASCII; 
    }
    
    0 讨论(0)
  • 2020-12-11 04:22

    The only way to reliably do this is to look for byte order marks at the start of the text file. (This blob more generally represents the endianness of character encoding used, but also the encoding - e.g. UTF8, UTF16, UTF32). Unfortunately, this method only works for Unicode-based encodings, and nothing before that (for which much less reliable methods must be used).

    The StreamReader type supports detecting these marks to determine the encoding - you simply need to pass a flag to the parameter as such:

    new System.IO.StreamReader("path", true)
    

    You can then check the value of stremReader.CurrentEncoding to determine the encoding used by the file. Note however that if no byte encoding marks exist, then CurrentEncoding will default to Encoding.Default.

    Refer codeproject solution to detect encoding

    0 讨论(0)
提交回复
热议问题