Convert a string's character encoding from windows-1252 to utf-8

前端未结

关注

 4  766

I had converted a Word Document(docx) to html, the converted html has windows-1252 as its character encoding. In .Net for this 1252 character encoding all the special charac

相关标签:

4条回答

隐瞒了意图╮

2020-11-30 11:38

Use Encoding.Convert method. Details are in the Encoding.Convert method MSDN article.

0 讨论(0)
发布评论:

提交评论
- 加载中...

野趣味

2020-11-30 11:39

Actually the problem lies here

byte[] wind1252Bytes = wind1252.GetBytes(strHtml);

We should not get the bytes from the html String. I tried the below code and it worked.

Encoding wind1252 = Encoding.GetEncoding(1252);
Encoding utf8 = Encoding.UTF8;
byte[] wind1252Bytes = ReadFile(Server.MapPath(HtmlFile));
byte[] utf8Bytes = Encoding.Convert(wind1252, utf8, wind1252Bytes);
string utf8String = Encoding.UTF8.GetString(utf8Bytes);


public static byte[] ReadFile(string filePath)      
    {      
        byte[] buffer;   
        FileStream fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read);  
        try
        {
            int length = (int)fileStream.Length;  // get file length    
            buffer = new byte[length];            // create buffer     
            int count;                            // actual number of bytes read     
            int sum = 0;                          // total number of bytes read    

            // read until Read method returns 0 (end of the stream has been reached)    
            while ((count = fileStream.Read(buffer, sum, length - sum)) > 0)
                sum += count;  // sum is a buffer offset for next reading
        }
        finally
        {
            fileStream.Close();
        }
        return buffer;
    }

0 讨论(0)

既然无缘

2020-11-30 11:44

This should do it:

Encoding wind1252 = Encoding.GetEncoding(1252);
Encoding utf8 = Encoding.UTF8;  
byte[] wind1252Bytes = wind1252.GetBytes(strHtml);
byte[] utf8Bytes = Encoding.Convert(wind1252, utf8, wind1252Bytes);
string utf8String = Encoding.UTF8.GetString(utf8Bytes);

0 讨论(0)

佛祖请我去吃肉

2020-11-30 11:48
How you are planning to use resulting html? The most appropriate way in my opinion to solve your problem would be add meta with encoding specification. Something like:
```
<meta http-equiv="content-type" content="text/html;charset=UTF-8" />
```
0 讨论(0)
发布评论:

提交评论
- 加载中...