html-agility-pack | 易学教程

XPath selecting between comments multiple times

阅读更多关于 XPath selecting between comments multiple times

问题 We need to read node between HTML comments: <html>  <div>some text</div> <div><p>Some more elements</p></div>   <div>some text</div> <div><p>Some more elements</p>  </div> </html> I tried using the below XPath: //*[preceding-sibling::comment()[contains(., 'comment 1')]][following-sibling::comment()[contains(., 'end content')]] It works fine for first comment i.e. comment 1 but not working for second comment following

HTMLAgilityPack, HTML duplicate IDs

阅读更多关于 HTMLAgilityPack, HTML duplicate IDs

问题 Hi: This is similar to this one here. But needs to be done at the server level rather at the client level. Currently I use HTMLAgilityPack, is there anyway I could detect duplicate IDs? Thanks in advance. 回答1: Here's a quick way to do it: HtmlDocument doc = new HtmlDocument(); doc.LoadHtml(htmlString); var count = new Dictionary<string, int>(); foreach (var node in doc.DocumentNode.Descendants()) { string id = node.GetAttributeValue("id", null); if (id != null) { if (count.ContainsKey(id))

Scraping HTML from Google Translate

阅读更多关于 Scraping HTML from Google Translate

问题 I want to translate a string using Google Translator. My sample string is "this is my string" . I want to use HTML Agility Pack to parse HTML documents. I tried this: using HtmlAgilityPack; ........ var webGet = new HtmlWeb(); var document = webGet.Load( "http://translate.google.com/#en/bn/this%20is%20my%20string"); var node = document.DocumentNode.SelectNodes( "//span[@class='short_text' and @id='result_box']"); if (node != null) { foreach (var xx in node) { x = xx.InnerText; MessageBox.Show

C# HTML Agility Pack Single Select Node returning null

阅读更多关于 C# HTML Agility Pack Single Select Node returning null

问题 I have a web scraper developed using C#, windows forms and the HTML Agility Pack. I had it all working great when the site changed it's code and broke it. I know it happens often with web scrapers but now I am having trouble figuring out how to correct the issue. At this time my scraper loops thru multiple URL's and scrapes data from each page. The problem I am running into is that the template of the site it loops thru will randomly show the newer template which does not have the same HTML

select divs and put into collection using htmlagilitypack not working

阅读更多关于 select divs and put into collection using htmlagilitypack not working

问题 Why does this not work? I get a null reference exception error on the foreach loop as it starts I'm trying to get all the divs text on a page and put each one into my own collection Imports HtmlAgilityPack Imports System.Xml Partial Class _Default Inherits System.Web.UI.Page Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load Dim webGet As HtmlWeb = New HtmlWeb Dim htmlDoc As HtmlDocument = webGet.Load("http://www.mysite.com") Dim ids As New List(Of

Can't figure how to parse using HTML Agility Pack

阅读更多关于 Can't figure how to parse using HTML Agility Pack

问题 I have the following chunk of HTML code but i cant figure how i can get the designated values <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> </head> <body > <form name="form1" method="post" action="" id="form1"> <div> <table class="tableclass" > <tbody> <tr> <tr> <td colspan="5" class="myclass1"><span id="myclass2">value1</span></td> </tr> <tr id="idvalue" aa="1" class

Ignoring parse errors with Html Agility Pack?

阅读更多关于 Ignoring parse errors with Html Agility Pack?

问题 I'm trying to parse out a single page from YouTube... Which isn't really free of syntax errors. Html Agility Pack screams about these errors, and returns nothing in result. http://codepaste.net/gh3hco 回答1: I haven't tried this, but based on a suggestion in their forum you can use HTML Tidy or Tidy.NET to clean the HTML first. Optionally, you could find the erroneous tags and remove them in a pre-process step. 来源： https://stackoverflow.com/questions/6182404/ignoring-parse-errors-with-html

Html Agility Pack nextsibling not finding element if there are white spaces between tags

阅读更多关于 Html Agility Pack nextsibling not finding element if there are white spaces between tags

问题 I'm trying to find the nextSibling innerText of a specific tag, but I can't get the proper value when I try to parse an html string which contains white spaces or new lines here is my code: private string getTableTdValue() { /*If I strip all white space between tag I get proper results string myHtml = "<td align='right' width='186'>Text1</td><td align='center' width='51'>Here the result I want to get</td><td width='186'>Text2</td>";*/ string myHtml = @" <td align='right' width='186'>Text1</td

HtmlAgilityPack-PCL + LINQ

阅读更多关于 HtmlAgilityPack-PCL + LINQ

问题 Well, basically I have a Windows Phone 8.1 app that's supposed to download the html file and parse it using HtmlAgilityPack-PCL and LINQ. var nodes = from tr in doc.DocumentNode.Descendants("body") from td in tr.Descendants("div").Where(x => x.Attributes["id"].Value == "screen")select tr; Then I'm trying to get the node from nodes : HtmlNode node = nodes.FirstOrDefault(); And this is the point where i have an exeption "Object reference not set to an instance of an object." The html file

HTML Agility Pack - Select node after particular paragraph

阅读更多关于 HTML Agility Pack - Select node after particular paragraph

问题 I have this kind of situation : various files with the following HTML. I need to retreive only the list after "targetWord" paragraph (of course it changes position in the pages I need to parse). How can I do with HTML Agility Pack? <p>Word1</p> <ul> <li>listobject1</li> <li>listobject2</li> <li>listobject3</li> </ul> <p>targetWord</p> <ul> <li>listobject4</li> <li>listobject5</li> <li>listobject6</li> </ul> <p>Word2</p> <ul> <li>listobject7</li> <li>listobject8</li> <li>listobject9</li> </ul>