Characters in string changed after downloading HTML from the internet

前端 未结 3 1091
佛祖请我去吃肉
佛祖请我去吃肉 2020-11-27 04:55

Using the following code, I can download the HTML of a file from the internet:

WebClient wc = new WebClient();

// ....

string downloadedFile = wc.DownloadS         


        
3条回答
  •  遥遥无期
    2020-11-27 05:57

    Since I am not allowed to comment (insufficient reputation), I'll have to post an additional answer. I am using Mikael's great class routinely, but I encountered a practical problem with the regex that tries to find the charset meta-info. This

    Match m = new Regex(@"[A-Za-z0-9_-]+)", RegexOptions.Singleline | RegexOptions.IgnoreCase).Match(html); 
    

    fails on this

    
    

    whereas this

    Match m = new Regex(@"[A-Za-z0-9_-]+)""?", RegexOptions.Singleline | RegexOptions.IgnoreCase).Match(html);
    

    does not.

    Thanks, Mikael.

提交回复
热议问题