html-agility-pack

How do I remove whitespace in HTML Source with Html Agility Pack and C#

佐手、 提交于 2019-12-11 00:15:23
问题 Before posting I tried the solution from this thread: C# - Remove spaces in HTML source in between markups? Here is a snippet of the HTML I'm working with: <p>This is my text</p> <p> </p> <p> </p> <p> </p> <p> </p> <p> </p> <p> </p> <p>This is next text</p> I'm using HTML Agility Pack to clean up the HTML: HtmlDocument doc = new HtmlDocument(); doc.Load(htmlLocation); foreach (var item in doc.DocumentNode.Descendants("p").ToList()) { if (item.InnerHtml == " ") { item.Remove(); } } The output

Returns a null from html node

前提是你 提交于 2019-12-10 22:15:46
问题 I am attempting to access the company name from this page. Should return a node with innertext of "Cascade corporation" however I get null instead. HtmlNode htest = document.DocumentNode.SelectSingleNode("//*[@id='appbar']/div/div[2]/div[1]/span"); what am I missing? P.S. must work with Chrome 回答1: I tried to reproduce your issue on my machine. I captured request & response data using Fiddler. I was surprised to notice that rendered html output from browser is different from my code. From

HtmlAgilityPack giving exception “Multiple node elments can't be created.”

痴心易碎 提交于 2019-12-10 20:55:18
问题 I have some input tags that are placeholders that I am replacing with some HTML. I am using below code to create html node below is the code snippet. But it is giving error as "Multiple node elements can't be created" when there are no multiple nodes. string tempString = "<p style="margin-left:0px;margin-right:0px;text-indent:0px;text-align:justify;">(c)<span style='display: inline-block; width: 30px; min-width: 30px;'> </span><span class='noCount4'> </span>paragraph <span class="Ellh_">(a)

Remove chain of duplicate elements with HTML Agility Pack

*爱你&永不变心* 提交于 2019-12-10 18:06:31
问题 I'm trying to remove any duplicate or more occurrences of any < br > tags in my html document. This is what I've come up with so far (really stupid code): HtmlNodeCollection elements = nodeCollection.ElementAt(0) .SelectNodes("//br"); if (elements != null) { foreach (HtmlNode element in elements) { if (element.Name == "br") { bool iterate = true; while(iterate == true) { iterate = removeChainElements(element); } } } } private bool removeChainElements(HtmlNode element) { if (element

Parsing html using agility pack

混江龙づ霸主 提交于 2019-12-10 17:48:24
问题 I have a html to parse(see below) <div id="mailbox" class="div-w div-m-0"> <h2 class="h-line">InBox</h2> <div id="mailbox-table"> <table id="maillist"> <tr> <th>From</th> <th>Subject</th> <th>Date</th> </tr> <tr onclick="location='readmail.html?mid=welcome'" style="font-weight: bold;"> <td>no-reply@somemail.net</td> <td> <a href="readmail.html?mid=welcome">Hi, Welcome</a> </td> <td> <span title="2016-02-16 13:23:50 UTC">just now</span> </td> </tr> <tr onclick="location='readmail.html?mid

HTML Agility Pack get all input fields

China☆狼群 提交于 2019-12-10 16:32:24
问题 I found some code on the internet that finds all the href tags and changes them to google.com, but how can I tell the code to find all the input fields and put custom text in there? This is the code I have right now: HtmlDocument doc = new HtmlDocument(); doc.Load(path); foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]")) { HtmlAttribute att = link.Attributes["href"]; att.Value = "http://www.google.com"; } doc.Save("file.htm"); Please, can someone help me, I can't seem to

Work-around a StackOverflowException

牧云@^-^@ 提交于 2019-12-10 13:45:50
问题 I'm using HtmlAgilityPack to parse roughly 200,000 HTML documents. I cannot predict the contents of these documents, however one such document causes my application to fail with a StackOverflowException . The document contains this HTML: <ol> <li><li><li><li><li><li>... </ol> There are roughly 10,000 <li> elements nested like that. Due to the way HtmlAgilityPack parses HTML it causes a StackOverflowException . Unfortunately a StackOverflowException is not catchable in .NET 2.0 and later. I

Losing the 'less than' sign in HtmlAgilityPack loadhtml

懵懂的女人 提交于 2019-12-10 12:55:00
问题 I recently started experimenting with the HtmlAgilityPack. I am not familiar with all of its options and I think therefor I am doing something wrong. I have a string with the following content: string s = "<span style=\"color: #0000FF;\"><</span>"; You see that in my span I have a 'less than' sign. I process this string with the following code: HtmlDocument htmlDocument = new HtmlDocument(); htmlDocument.LoadHtml(s); But when I do a quick and dirty look in the span like this: htmlDocument

HTML Agility Pack - How can append element at the top of Head element?

筅森魡賤 提交于 2019-12-10 12:31:49
问题 I'm trying to use HTML Agility Pack to append a script element into the top of the HEAD section of my html. The examples I have seen so far just use the AppendChild(element) method to accomplish this. I need the script that I am appending to the head section to come before some other scripts. How can I specify this? Here's what I'm trying: HtmlDocument htmlDocument = new HtmlDocument(); htmlDocument.Load(filePath); HtmlNode head = htmlDocument.DocumentNode.SelectSingleNode("/html/head");