html-agility-pack

How to make asynchronous calls using HtmlAgilityPack?

浪子不回头ぞ 提交于 2019-12-06 05:53:25
I'm trying to get the table with id table-matches available here . The problem is that table is loaded using ajax so I don't get the full html code when I download the page: string url = "http://www.oddsportal.com/matches/soccer/20180701/"; using (HttpClient client = new HttpClient()) { using (HttpResponseMessage response = client.GetAsync(url).Result) { using (HttpContent content = response.Content) { string result = content.ReadAsStringAsync().Result; } } } the html returned does not contains any table, so I tried to see if there is a problem of the library, infact I setted on Chrome

HtmlAgilityPack: xpath and regex

≡放荡痞女 提交于 2019-12-06 05:44:53
I'm currently using HtmlAgilityPack to search for certain content via an xpath query. Something like this: var col = doc.DocumentNode.SelectNodes("//*[text()[contains(., 'foo'] or @*.... Now I want to search for specific content in all of the html sourcecode (= text, tags and attributes) using a regular expression. How can this be achived with HtmlAgilityPack? Can HtmlAgilityPack handle xpath+regex or what would be the best way of using a regex and HtmlAgilityPack to search? The Html Agility Pack uses the underlying .NET XPATH implementation for its XPATH support. Fortunately XPATH in .NET is

Library to generate .NET XmlDocument from HTML tag soup

一世执手 提交于 2019-12-06 05:36:02
I'm looking for a .NET library that can generate a clean Xml tree, ideally System.Xml.XmlDocument, from invalid HTML code. I.E. it should make the kind of best effort guesses, repairs, and substitutions browsers do when confronted with this situation, and generate a pretend XmlDocument. The library should also be well-maintained. :) I realize this is a lot (too much?) to ask, and I would appreciate any useful leads. There seem to be a fair number of implementations of this for Java, but I would rather not generate my own bindings. So far for .NET I have found http://www.majestic12.co.uk

How to Get element by class in HtmlAgilityPack

ⅰ亾dé卋堺 提交于 2019-12-06 03:42:22
Hello i making HttpWebResponse and getting the HtmlPage with all data that i need for example table with date info that i need to save them to array list and save it to xml file Example of html Page <table> <tr> <td class="padding5 sorting_1"> <span class="DateHover">01.03.14</span> </td> <td class="padding5 sorting_1"> <span class="DateHover" >10.03.14</span> </td> </tr> </table> my code that not working i using the HtmlAgilityPack private static string GetDataByIClass(string HtmlIn, string ClassToGet) { HtmlAgilityPack.HtmlDocument DocToParse = new HtmlAgilityPack.HtmlDocument(); DocToParse

Get innertext between two tags - VB.NET - HtmlAgilityPack

99封情书 提交于 2019-12-06 03:41:45
I'm using HtmlAgilityPack and I want to get the inner text between two specific tags, for example: <a name="a"></a>Sample Text<br> I want to get the innertext between </a> and <br> tags: Sample Text How can I do it? TIA... Once you have reached the anchor you could use the NextSibling property: Dim doc = New HtmlDocument() doc.LoadHtml("<html><body><a name=""a""></a>Sample Text<br></body></html>") Dim a = doc.DocumentNode.SelectSingleNode("//a[@name=""a""]") Console.WriteLine(a.NextSibling.InnerText) 来源: https://stackoverflow.com/questions/7291644/get-innertext-between-two-tags-vb-net

XPath to first occurrence of element with text length >= 200 characters

南笙酒味 提交于 2019-12-06 03:40:18
问题 How do I get the first element that has an inner text (plain text, discarding other children) of 200 or more characters in length? I'm trying to create an HTML parser like Embed.ly and I've set up a system of fallbacks where I first check for og:description , then I would search for this occurrence and only then for the description meta tag. This is because most sites that even include meta description describe their site in that tag, instead of the contents of the current page. Example:

Can HtmlAgilityPack handle an xml file that comes with an xsl file to render html?

你说的曾经没有我的故事 提交于 2019-12-06 03:37:42
问题 I was wondering the best way for HtmlAgilityPack to read an xml file that includes an xsl file to render html. Are there any settings on the HtmlDocument class that would assist in this, or do I have to find a way to execute the transformation before loading it with HtmlAgiliyPack? If yes for the latter, anybody know of a good library or method for such a transformation? Below is an example of a website that returns xml with xls file and the code that I would like to use. var uri = new Uri(

NullReferenceException in HtmlAgilityPack

与世无争的帅哥 提交于 2019-12-06 03:31:28
问题 I am trying to extract a link using xpath from the below mentioned url string url = "http://www.album-cover-art.org/search.php?q=Ruin+-+Live+Album+Version+Lamb+of+God" My code: HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb(); HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument(); htmlDoc = web.Load(url); //Exception generated here Line 23 if (htmlDoc.DocumentNode != null) { HtmlNode linkNode = htmlDoc.DocumentNode.SelectSingleNode(".//*[@id='related_search_row'

Is there an XmlReader equivalent for HTML in .Net?

断了今生、忘了曾经 提交于 2019-12-05 19:45:27
I've used HtmlAgilityPack in the past to parse HTML in .Net but I don't like the fact that it only uses a DOM model. On large documents and/or those with heavy levels of nesting it is possible to hit stack overflow or out of memory exceptions. Also in general a DOM based parsing model uses significantly more memory than a streaming based approach, typically because the process that wants to consume the HTML may only need a few elements to be available at a time. Does anyone know of a decent HTML parser for .Net that allows you to parse HTML in a manner similar to the XmlReader class? i.e. in a

Where's the bug in this tree traversal code?

风流意气都作罢 提交于 2019-12-05 19:05:37
There's a bug in Traverse() that's causing it to iterate nodes more than once. Bugged Code public IEnumerable<HtmlNode> Traverse() { foreach (var node in _context) { yield return node; foreach (var child in Children().Traverse()) yield return child; } } public SharpQuery Children() { return new SharpQuery(_context.SelectMany(n => n.ChildNodes).Where(n => n.NodeType == HtmlNodeType.Element), this); } public SharpQuery(IEnumerable<HtmlNode> nodes, SharpQuery previous = null) { if (nodes == null) throw new ArgumentNullException("nodes"); _previous = previous; _context = new List<HtmlNode>(nodes);