C# Is there a LINQ to HTML, or some other good .Net HTML manipulation API?

前端 未结 5 1377
轮回少年
轮回少年 2020-11-30 07:42

I have a C# WPF application that needs to consume data that is exposed on a webpage as a HTML table.

After getting inspiration from this url I tried using Linq to X

5条回答
  •  抹茶落季
    2020-11-30 08:21

    I had to do this in a recent project and I used LINQ to XML. If you know it's always going to be clean XHTML then you can probably recursively copy the DOM pretty easily, but I used the DevComponents HTMLDocument class library (http://www.devcomponents.com/htmldoc/) to convert HTML to XML then pulled that into an XElement. This reduces the challenge to getting your HTML into an XElement hierarchy. The one caveat is it chokes on script elements, so I deleted those by brute force.

        /// 
        /// Extracts an HtmlDocument DOM to an XElement DOM that can be queried using LINQ to XML.
        /// 
        /// HtmlDocument containing DOM of page to extract.
        /// HTML content as  for consumption by LINQ to XML.
        public XElement ExtractXml(HtmlDocument htmlDocument) {
            XmlDocument xmlDoc = htmlDocument.ToXMLDocument();
    
            // Find and remove all script tags from XML DOM or LINQ to XML will choke on XElement.Parse(XmlDocument).
            IList nodes = new List();
            foreach (XmlNode node in xmlDoc.GetElementsByTagName("script"))
                nodes.Add(node);
            foreach (XmlNode node in nodes)
                node.ParentNode.RemoveChild(node);
    
            return XElement.Parse(xmlDoc.OuterXml);
        }
    

提交回复
热议问题