html-agility-pack

What is the best way to get the HTML for HTML Agiligy Pack to process?

◇◆丶佛笑我妖孽 提交于 2019-12-02 07:42:17
I can't seem to get the HTML from a few sites, but can from many others. Here are 2 sites I am having issues with: https://www.rei.com https://www.homedepot.com I am building an app that will get meta tag info from a URL that the user enters. Once I get the HTML the code, I process it using HTML Agility pack and it works perfectly. The problem is with getting the HTML from various websites. I have tried various ways to get the HTML ( HtmlWeb , HttpWebRequest and others) all with setting the user-agent (same agent tag as chrome), headers, cookies and autoredirect, gzip-ing and seems like every

html agility pack remove children

谁说我不能喝 提交于 2019-12-02 07:19:28
I'm having difficulty trying to remove a div with a particular ID, and its children using the HTML Agility pack. I am sure I'm just missing a config option, but its Friday and I'm struggling. The simplified HTML runs: <html><head></head><body><div id='wrapper'><div id='functionBar'><div id='search'></div></div></div></body></html> This is as far as I have got. The error thrown by the agility pack shows it cannot find a div structure: <div id='functionBar'></div> Here's the code so far (taken from Stackoverflow....) HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument(); //

How to read JavaScript object with XPath/HTMLAgilityPack

﹥>﹥吖頭↗ 提交于 2019-12-02 06:38:39
问题 For my crawler project, I need to get product details from JavaScript object. How can I effectively get object details from the following JavaScript? I Use XPath and HTMLAgilityPack. <script type="text/javascript"> var product = { identifier: '2051189775', //PRODUCT ID fn: 'Fit- Whiskered Dark Wash Skirt', category: ['sale'], brand: 'Brand Name', price: '22.90', // this would be the discount price amount: '31.80', // this would be the original price currency: 'USD', //List can me even more. }

How to read JavaScript object with XPath/HTMLAgilityPack

浪子不回头ぞ 提交于 2019-12-02 05:47:59
For my crawler project, I need to get product details from JavaScript object. How can I effectively get object details from the following JavaScript? I Use XPath and HTMLAgilityPack. <script type="text/javascript"> var product = { identifier: '2051189775', //PRODUCT ID fn: 'Fit- Whiskered Dark Wash Skirt', category: ['sale'], brand: 'Brand Name', price: '22.90', // this would be the discount price amount: '31.80', // this would be the original price currency: 'USD', //List can me even more. }; </script> I've not tried getting details from JavaScript objects before. I was getting details

HtmlAgilityPack invalid markup

↘锁芯ラ 提交于 2019-12-02 05:12:31
I am using the HtmlAgilityPack from codeplex. When I pass a simple html string into it and then get the resulting html back, it cuts off tags. Example: string html = "<select><option>test</option></select>"; HtmlDocument document = new HtmlDocument(); document.LoadHtml(html); var result = d.DocumentNode.OuterHtml; // result gives me: <select><option>test</select> So the closing tag for the option is missing. Am I missing a setting or using this wrong? Gabe I fixed this by commenting out line 92 of HtmlNode.cs in the source, compiled and it worked like a charm. ElementsFlags.Add("option",

Html Agility Pack c# Paragraph parsing problem

六月ゝ 毕业季﹏ 提交于 2019-12-02 04:58:59
I am having a couple of issues with my code, I am trying to pull every paragraph from a page, but at the moment it is only selecting the last paragraph. here is my code. foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//div[@id='body']/p")) { string text = node.InnerText; lblTest2.Text = text; } In your loop you are taking the current node innerText and assigning it to the label. You do this to each node, so of course you only see the last one - you are not preserving the previous ones. Try this: foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//div[@id='body']/p")) { string

getting <a> tags and attribute with htmlagilitypack with vb.net

霸气de小男生 提交于 2019-12-02 04:30:57
i have this code Dim htmldoc As HtmlDocument = New HtmlDocument() htmldoc.LoadHtml(strPageContent) Dim root As HtmlNode = htmldoc.DocumentNode For Each link As HtmlNode In root.SelectNodes("//a") If link.HasAttributes("href") Then doSomething() 'this doesn't work because hasAttributes only checks whether an element has attributes or not Next but am getting an error Object reference not set to an instance of an object. the document contains at least one anchor-tag? how do i check if an attribute exits? i tried this if link.HasAttributes("title") then and get another error Public ReadOnly

Using HtmlAgilityPack with MonoTouch app gives reference error

孤者浪人 提交于 2019-12-02 03:42:59
I'm trying to use the Html Agility Pack with a MonoTouch application, but cannot find a version that will work with it. I downloaded the latest binaries from CodePlex and I've tried building with every DLL it contains. None will compile when the target is the iPhone. Adding the .NET 20 library will allow it to compile to the iPhone Simulator, but when switching to the iPhone I get the error: Error MT2002: Can not resolve reference: System.Diagnostics.TraceListener (MT2002) (MFLPlatinum12) It seems like others are using HtmlAgilityPack with MonoTouch projects, so any thoughts as to what I'm

XHTML Parsing with HTMLAgilityPack

人盡茶涼 提交于 2019-12-02 03:10:40
问题 I have a list of the following elements inside a element that I have found using HTMLAgilityPack. <option value="67"><span style="color: #cc0000;">Horde</span> Leveling / Dailies & Event Guide ($50.00)</option> What I need to do is parse all the text out of the tag, without all the mumbo jumbo in there. I've tried (seemingly!) everything, but it always comes out looking like this: Horde Leveling / Dailies & Event Guide ($50.00) and sometimes like: Horde Leveling / Dailies & Event Guide ($50

XHTML Parsing with HTMLAgilityPack

青春壹個敷衍的年華 提交于 2019-12-02 02:45:28
I have a list of the following elements inside a element that I have found using HTMLAgilityPack. <option value="67"><span style="color: #cc0000;">Horde</span> Leveling / Dailies & Event Guide ($50.00)</option> What I need to do is parse all the text out of the tag, without all the mumbo jumbo in there. I've tried (seemingly!) everything, but it always comes out looking like this: Horde Leveling / Dailies & Event Guide ($50.00) and sometimes like: Horde Leveling / Dailies & Event Guide ($50.00) and a couple other variations like that. I've even gone so far as to print out each character in the