Can't download HTML data from https URL using htmlagilitypack

流过昼夜 提交于 2019-11-28 12:13:00
har07

HtmlWeb doesn't support downloading from https. So instead, you can use WebClient with a bit of modification to automatically decompress GZip :

class MyWebClient : WebClient
{
    protected override WebRequest GetWebRequest(Uri address)
    {
        HttpWebRequest request = base.GetWebRequest(address) as HttpWebRequest;
        request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
        return request;
    }
}

Then use HtmlDocument.LoadHtml() to populate your HtmlDocument instance from HTML string :

var url = "https://kat.cr/";
var data = new MyWebClient().DownloadString(url);
var doc = new HtmlDocument();
doc.LoadHtml(data);

You can intercept the request when using HtmlWeb to modify it based on your requirements.

var page = new HtmlWeb()
{
  PreRequest = request =>
  {
    // Make any changes to the request object that will be used.
    request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
    return true;
  }
};

var url = "https://kat.cr/";
var data = page.Load(url);
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!