C# and HtmlAgilityPack encoding problem

半腔热情 提交于 2019-11-28 09:40:50
Mikael Svenson

Actually the page is encoded with UTF-8.

GodLikeHTML.Load(GodLikeClient.OpenRead("http://www.alfa.lt"), Encoding.UTF8);

will work.

Or you could use the code in my SO answer which detects encoding from http headers or meta tags, en re-encodes properly. (It also supports gzip to minimize your download).

With the download class your code would look like:

HttpDownloader downloader = new HttpDownloader("http://www.alfa.lt",null,null);
GodLikeHTML.LoadHtml(downloader.GetPage());

I had a similar encoding problems. I fixed it, in the most current version of HtmlAgilityPack, by adding the following to my WebClient initialization.

var htmlWeb = new HtmlWeb();
htmlWeb.OverrideEncoding = Encoding.UTF8;
var doc = htmlWeb.Load("www.alfa.lt");
 HtmlAgilityPack.HtmlDocument doc = new HtmlDocument(); 
 StreamReader reader = new StreamReader(WebRequest.Create(YourUrl).GetResponse().GetResponseStream(), Encoding.Default); //put your encoding            
 doc.Load(reader);

hope it helps :)

UTF8 didn't work for me, but after setting the encoding like this, most pages i was trying to scrape worked just wel:

web.OverrideEncoding = Encoding.GetEncoding("ISO-8859-1");

Perhaps it might help someone.

try to change that to GodLikeHTML.Load(GodLikeClient.OpenRead("www.alfa.lt"), Encoding.GetEncoding(1257));

This is my solution

 HttpWebRequest request =(HttpWebRequest)WebRequest.Create("http://www.sina.com.cn");
HttpWebResponse response =(HttpWebResponse)request.GetResponse();
long len = response.ContentLength;
byte[] barr = new byte[len]; 
response.GetResponseStream().Read(barr, 0, (int)len); 
response.Close();
string data = Encoding.UTF8.GetString(barr); 
var encod = doc.DetectEncodingHtml(data);
string convstr = Encoding.Unicode.GetString(Encoding.Convert(encod, Encoding.Unicode, barr));
doc.LoadHtml(convstr);

if all of those post doesn't work, Just use this: WebUtility.HtmlDecode("Your html text");

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!