html-agility-pack

Select all <p>'s from a Node's children using HTMLAgilityPack

不羁岁月 提交于 2019-12-07 05:23:33
问题 I've got the following code that I'm using to get a html page. Make the urls absolute and then make the links rel nofollow and open in a new window/tab. My issue is around the adding of the attributes to the <a> s. string url = "http://www.mysite.com/"; string strResult = ""; HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url); HttpWebResponse response = (HttpWebResponse)request.GetResponse(); if ((request.HaveResponse) && (response.StatusCode == HttpStatusCode.OK)) { using

HTMLAgilityPack - You need to set UseIdAttribute property to true to enable this feature

我们两清 提交于 2019-12-06 22:58:54
问题 I am trying to use HTMLAgilityPack with VS2008/.Net 3.5. I get this error even if I set the OptionUseIdAttribute to true, though it is supposed to be true by default. Error Message: You need to set UseIdAttribute property to true to enable this feature Stack Trace: at HtmlAgilityPack.HtmlDocument.GetElementbyId(String id) I tried version 1.4.6 and 1.4.0, neither worked. Version 1.4.6 - Net20/HtmlAgilityPack.dll Version 1.4.0 - Net20/HtmlAgilityPack.dll This is the code, HtmlWeb web = new

Html Agility Pack - New HtmlAttribute

谁都会走 提交于 2019-12-06 22:34:12
问题 Using Html Agility Pack in C# I have a node I'd like to add an attribute to. Currently the node is an <li> element with no attributes and I'd like to add a class to it of "active". It looks like the best thing to use would be node.Attributes.Add(attrClass) Where attrClass is a HtmlAttribute of class="active" . However if I try to define a new HtmlAttribute I get an error stating that it doesn't have any constructors. Eg HtmlAttribute attrClass = new HtmlAttribute(); Is there something wrong

HTMLagilitypack is not removing all html tags How can I solve this efficiently?

巧了我就是萌 提交于 2019-12-06 20:25:32
问题 I am using following method to strip all html from the string: public static string StripHtmlTags(string html) { if (String.IsNullOrEmpty(html)) return ""; HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(html); return doc.DocumentNode.InnerText; } But it seems ignoring this following tag: […] So the string returns basicly: > A hungry thief who stole a rack of pork ribs from a grocery store has > been sentenced to spend 50 years in prison. Willie Smith Ward

Delete all elements between two elements

孤者浪人 提交于 2019-12-06 15:28:41
I have about 2500 html-files of different standards. I need to remove the footer part of them. The HTML-code below is one of my files footer, and I need to remove the two hr-elements and the elements between the two. So far I have only tried targeting the hr-element with xpath (and HTML Agility Pack) selectSingleNode and DocumentNode.SelectNodes("//hr"); . And then try to iterate with a foreach. But I am too much of a noob to use XPath properly, and don't know how to select the node and its siblings(?) to delete them. This is what I've got so far, with the help of this community. :) private

parsing html with HTMLAGILITYPACK and loading into datatable C#

大兔子大兔子 提交于 2019-12-06 14:18:17
问题 I have HTML that looks like this: <body class="style_0"> <div> <div class="style_1">Pending Test List</div> <table style=" width: 100%;" id="AUTOGENBOOKMARK_4365445353431356880"> <col> <col> <tbody> <tr> <td style="vertical-align: baseline;"> <div class="style_4">Pending Test List</div> </td> <td style="vertical-align: baseline;"> <div class="style_5">SOME AGENCY Laboratories, Inc.</div> </td> </tr> </tbody> </table> <table class="style_6" style=" width: 4.531in;" id="AUTOGENBOOKMARK

C# htmlagility pack, capturing redirct

血红的双手。 提交于 2019-12-06 07:29:56
HI all, this one is really simple (I hope). I'm using htmlagility pack to do my webcrawling. So what happens if I input url whatever, that then directs me to a new url, how do I capture that new redirected URL? If htmlagility pack doesnt have a way, can someone suggest another method? When you create your HttpWebRequest you can set AllowAutoRedirect property to true and it will automatically follow any redirects you have. HttpWebRequest myHttpWebRequest = (HttpWebRequest)WebRequest.Create("http://www.contoso.com"); myHttpWebRequest.MaximumAutomaticRedirections=1; myHttpWebRequest

C# HtmlAgilityPack parse <ul>

ぃ、小莉子 提交于 2019-12-06 07:24:07
问题 I want to parse the following HTML. What I currently have is var node = document.DocumentNode.SelectSingleNode("//div[@class='wrapper']"); The html is <div class="wrapper"> <ul> <li data="334040566050326217"> <span>test1</span> </li> <li data="334040566050326447"> <span>test2</span> </li> </ul> I need to get the number from the li data and the value between the span tag. Any help appreciated. 回答1: Something like this might suit your needs. //Assumes your document is loaded into a variable

HTML Agility Pack

半世苍凉 提交于 2019-12-06 06:55:55
问题 I have html tables in one webpage like <table border=1> <tr><td>sno</td><td>sname</td></tr> <tr><td>111</td><td>abcde</td></tr> <tr><td>213</td><td>ejkll</td></tr> </table> <table border=1> <tr><td>adress</td><td>phoneno</td><td>note</td></tr> <tr><td>asdlkj</td><td>121510</td><td>none</td></tr> <tr><td>asdlkj</td><td>214545</td><td>none</td></tr> </table> Now from this webpage using html agility pack I want to extract the data of the column address and phone no only. It means for that I have

Need to replace an img src attrib with new value

北战南征 提交于 2019-12-06 06:00:55
I'm retrieving HTML of many webpages (saved earlier) from SQL Server. My purpose is to modify an img's src attribute. There is only one img tag in the HTML and it's source is like so: ... <td colspan="3" align="center"> <img src="/crossword/13cnum1.gif" height="360" width="360" border="1"><br></td> ... I need to change the /crossword/13cnum1.gif to http://www.nostrotech.com /crossword/13cnum1.gif Code: private void ReplaceTest() { String currentCode = string.Empty; Cursor saveCursor = Cursor.Current; try { Cursor.Current = Cursors.WaitCursor; foreach (WebData oneWebData in DataContext