NullReferenceException in HtmlAgilityPack

天涯浪子 提交于 2019-12-04 07:07:11
Alex

This is a bug in HtmlAgilityPack. The document you're trying to parse has <meta http-equiv="Content-Type" content="text/html; charset=iso-utf-8"> where the charset value (iso-utf-8) couldn't be parsed as a valid encoding name by AgilityPack. As Simon Mourier said, this is a bug introduced in 1.4.0.0.

To avoid this, manually load your document from a stream and set the encoding manually like this:

var htmlDoc = new HtmlDocument();
htmlDoc.OptionReadEncoding = false;
var request = (HttpWebRequest)WebRequest.Create(url);
request.Method = "GET";
using (var response = (HttpWebResponse)request.GetResponse())
{
    using (var stream = response.GetResponseStream())
    {
        htmlDoc.Load(stream, Encoding.UTF8);
    }
}
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!