html-agility-pack

HTML Agility Pack strip tags NOT IN whitelist

浪尽此生 提交于 2019-11-26 04:39:13
问题 I\'m trying to create a function which removes html tags and attributes which are not in a white list. I have the following HTML: <b>first text </b> <b>second text here <a>some text here</a> <a>some text here</a> </b> <a>some twxt here</a> I am using HTML agility pack and the code I have so far is: static List<string> WhiteNodeList = new List<string> { \"b\" }; static List<string> WhiteAttrList = new List<string> { }; static HtmlNode htmlNode; public static void RemoveNotInWhiteList(out

HTML Agility pack - parsing tables

时间秒杀一切 提交于 2019-11-26 03:04:33
问题 I want to use the HTML agility pack to parse tables from complex web pages, but I am somehow lost in the object model. I looked at the link example, but did not find any table data this way. Can I use XPath to get the tables? I am basically lost after having loaded the data as to how to get the tables. I have done this in Perl before and it was a bit clumsy, but worked. ( HTML::TableParser ). I am also happy if one can just shed a light on the right object order for the parsing. 回答1: How

How to use HTML Agility pack

佐手、 提交于 2019-11-25 22:14:40
问题 How do I use the HTML Agility Pack? My XHTML document is not completely valid. That\'s why I wanted to use it. How do I use it in my project? My project is in C#. 回答1: First, install the HTMLAgilityPack nuget package into your project. Then, as an example: HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument(); // There are various options, set as needed htmlDoc.OptionFixNestedTags=true; // filePath is a path to a file containing the html htmlDoc.Load(filePath); // Use: