Convert a string's character encoding from windows-1252 to utf-8

前端 未结 4 762
囚心锁ツ
囚心锁ツ 2020-11-30 11:26

I had converted a Word Document(docx) to html, the converted html has windows-1252 as its character encoding. In .Net for this 1252 character encoding all the special charac

相关标签:
4条回答
  • 2020-11-30 11:38

    Use Encoding.Convert method. Details are in the Encoding.Convert method MSDN article.

    0 讨论(0)
  • 2020-11-30 11:39

    Actually the problem lies here

    byte[] wind1252Bytes = wind1252.GetBytes(strHtml); 
    

    We should not get the bytes from the html String. I tried the below code and it worked.

    Encoding wind1252 = Encoding.GetEncoding(1252);
    Encoding utf8 = Encoding.UTF8;
    byte[] wind1252Bytes = ReadFile(Server.MapPath(HtmlFile));
    byte[] utf8Bytes = Encoding.Convert(wind1252, utf8, wind1252Bytes);
    string utf8String = Encoding.UTF8.GetString(utf8Bytes);
    
    
    public static byte[] ReadFile(string filePath)      
        {      
            byte[] buffer;   
            FileStream fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read);  
            try
            {
                int length = (int)fileStream.Length;  // get file length    
                buffer = new byte[length];            // create buffer     
                int count;                            // actual number of bytes read     
                int sum = 0;                          // total number of bytes read    
    
                // read until Read method returns 0 (end of the stream has been reached)    
                while ((count = fileStream.Read(buffer, sum, length - sum)) > 0)
                    sum += count;  // sum is a buffer offset for next reading
            }
            finally
            {
                fileStream.Close();
            }
            return buffer;
        }
    
    0 讨论(0)
  • 2020-11-30 11:44

    This should do it:

    Encoding wind1252 = Encoding.GetEncoding(1252);
    Encoding utf8 = Encoding.UTF8;  
    byte[] wind1252Bytes = wind1252.GetBytes(strHtml);
    byte[] utf8Bytes = Encoding.Convert(wind1252, utf8, wind1252Bytes);
    string utf8String = Encoding.UTF8.GetString(utf8Bytes);
    
    0 讨论(0)
  • 2020-11-30 11:48

    How you are planning to use resulting html? The most appropriate way in my opinion to solve your problem would be add meta with encoding specification. Something like:

    <meta http-equiv="content-type" content="text/html;charset=UTF-8" />
    
    0 讨论(0)
提交回复
热议问题