HtmlAgilityPack - How to set custom encoding when loading pages

半城伤御伤魂 提交于 2019-12-01 14:01:36

I suppose you could try overriding the encoding in the HtmlWeb object.

Try this:

var web = new HtmlWeb
{
    AutoDetectEncoding = false,
    OverrideEncoding = myEncoding,
};
var doc = web.Load(myUrl);

Note: It appears that the OverrideEncoding property was added to HTML agility pack in revision 76610 so it is not available in the current release v1.4 (66017). The next best thing to do would be to read the page manually with the encodings overridden.

var document = new HtmlDocument();

using (var client = new WebClient())
{
    using (var stream = client.OpenRead(url))
    {
        var reader = new StreamReader(stream, Encoding.GetEncoding("iso-8859-9"));
        var html = reader.ReadToEnd();
        document.LoadHtml(html);
    }
}

This is a simple version of the solution answered here (for some reasons it got deleted)

Eric

A decent answer is over here which handles auto-detecting the encoding as well as some other nifty features:

C# and HtmlAgilityPack encoding problem

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!