html-agility-pack

C# parse html with xpath

佐手、 提交于 2019-12-01 17:50:12
I'm trying to parse out stock exchange information whit a simple piece of C# from a HTML document. The problem is that I can not get my head around the syntax, the tr class="LomakeTaustaVari" gets parsed out but how do I get the second bit that has no tr-class? Here's a piece of the HTML, it repeats it self whit different values. <tr class="LomakeTaustaVari"> <td><div class="Ensimmainen">12:09</div></td> <td><div>MSI</div></td> <td><div>POH</div></td> <td><div>42</div></td> <td><div>64,50</div></td> </tr> <tr> <td><div class="Ensimmainen">12:09</div></td> <td><div>SRE</div></td> <td><div>POH<

From the Html Agility Pack download, which one of the 9 “HtmlAgilityPack.dll” do I use?

谁说我不能喝 提交于 2019-12-01 17:43:30
问题 There are nine folders in the downloaded zip file for HTML Agility Pack: Net20 Net40 Net40-client Net45 sl3-wp sl4 sl4-windowsphone71 sl5 winrt45 I do not know what these folder names mean. Please explain which one I need in order to scrape data from html files using VS2010. Please explain where I should put the files. 回答1: The different versions are compiled against different .NET framework versions. Some frameworks, such as the WinRT or the Silverlight frameworks, have more limited

HTMLAgilityPack get innerText of a td tag with an id attribute

泪湿孤枕 提交于 2019-12-01 17:31:26
I am trying to select the inner text of a td with an id attribute with the HTMLAgilityPack. Html Code: <td id="header1"> 5 </td> <td id="header2"> 8:39pm </td> <td id="header3"> 8:58pm </td> ... Code: HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(data); var nodes = doc.DocumentNode.SelectNodes("//td[@id='header1']"); if (nodes != null) { foreach (HtmlAgilityPack.HtmlNode node in nodes) { MessageBox.Show(node.InnerText); } } I keep getting null nodes because I am not selecting the td tag correctly but cannot figure out what I have done wrong... Edit: I made

C# parse html with xpath

こ雲淡風輕ζ 提交于 2019-12-01 17:18:23
问题 I'm trying to parse out stock exchange information whit a simple piece of C# from a HTML document. The problem is that I can not get my head around the syntax, the tr class="LomakeTaustaVari" gets parsed out but how do I get the second bit that has no tr-class? Here's a piece of the HTML, it repeats it self whit different values. <tr class="LomakeTaustaVari"> <td><div class="Ensimmainen">12:09</div></td> <td><div>MSI</div></td> <td><div>POH</div></td> <td><div>42</div></td> <td><div>64,50<

Set InnerText with HtmlAgilityPack

感情迁移 提交于 2019-12-01 16:45:02
问题 I've tried to set InnerText using the following, but I'm not allowed to set the InnerText property: node.InnerText = node.InnerText.Remove(100) + ".."; The reason for this is that I only want to remove text, not actual elements: <div> Lorem ipsum dolor sit amet, consectetur adipiscing elit. <img src="" /> </div> 回答1: I have just run into the same problem myself. Although the documentation says get or set it clearly is read-only. But inner text applies to EVERYTHING between the tags. So if you

Set InnerText with HtmlAgilityPack

て烟熏妆下的殇ゞ 提交于 2019-12-01 16:39:27
I've tried to set InnerText using the following, but I'm not allowed to set the InnerText property: node.InnerText = node.InnerText.Remove(100) + ".."; The reason for this is that I only want to remove text, not actual elements: <div> Lorem ipsum dolor sit amet, consectetur adipiscing elit. <img src="" /> </div> I have just run into the same problem myself. Although the documentation says get or set it clearly is read-only. But inner text applies to EVERYTHING between the tags. So if you have hundred of children ALL of their text including actual tags will be there. I think to do what you and

HtmlAgilityPack and Authentication

岁酱吖の 提交于 2019-12-01 16:34:20
问题 I have a method to get ids and xpaths if given a particular url. How do I pass in the username and password with the request so that I can scrape a url that requires a username and password? using HtmlAgilityPack; _web = new HtmlWeb(); internal Dictionary<string, string> GetidsAndXPaths(string url) { var webidsAndXPaths = new Dictionary<string, string>(); var doc = _web.Load(url); var nodes = doc.DocumentNode.SelectNodes("//*[@id]"); if (nodes == null) return webidsAndXPaths; // code to get

HTMLAgilityPack get innerText of a td tag with an id attribute

社会主义新天地 提交于 2019-12-01 16:27:04
问题 I am trying to select the inner text of a td with an id attribute with the HTMLAgilityPack. Html Code: <td id="header1"> 5 </td> <td id="header2"> 8:39pm </td> <td id="header3"> 8:58pm </td> ... Code: HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(data); var nodes = doc.DocumentNode.SelectNodes("//td[@id='header1']"); if (nodes != null) { foreach (HtmlAgilityPack.HtmlNode node in nodes) { MessageBox.Show(node.InnerText); } } I keep getting null nodes

Extracting Inner text from HTML BODY node with Html Agility Pack

给你一囗甜甜゛ 提交于 2019-12-01 15:35:58
Need a bit of help with HTML Agility Pack! Basically I want to grab plain-text withing the body node of the HTML. So far I have tried this in vb.net and it fails to return the innertext meaning no change is seen, well atleast from what I can see. Dim htmldoc As HtmlDocument = New HtmlDocument htmldoc.LoadHtml(html) Dim paragraph As HtmlNodeCollection = htmldoc.DocumentNode.SelectNodes("//body") If Not htmldoc Is Nothing Then For Each node In paragraph node.ParentNode.RemoveChild(node, True) Next End If Return htmldoc.DocumentNode.WriteContentTo I have tried this: Return htmldoc.DocumentNode

Extracting Inner text from HTML BODY node with Html Agility Pack

北战南征 提交于 2019-12-01 15:18:10
问题 Need a bit of help with HTML Agility Pack! Basically I want to grab plain-text withing the body node of the HTML. So far I have tried this in vb.net and it fails to return the innertext meaning no change is seen, well atleast from what I can see. Dim htmldoc As HtmlDocument = New HtmlDocument htmldoc.LoadHtml(html) Dim paragraph As HtmlNodeCollection = htmldoc.DocumentNode.SelectNodes("//body") If Not htmldoc Is Nothing Then For Each node In paragraph node.ParentNode.RemoveChild(node, True)