How to strip comments from HTML using Agility Pack without losing DOCTYPE

笑着哭i 提交于 2019-11-28 03:08:07

问题


I am trying to remove unnecessary content from HTML. Specifically I want to remove comments. I found a pretty good solution (Grabbing meta-tags and comments using HTML Agility Pack) however the DOCTYPE is treated as a comment and therefore removed along with the comments. How can I improve the code below to make sure the DOCTYPE is preserved?

var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(htmlContent);
var nodes = htmlDoc.DocumentNode.SelectNodes("//comment()");
if (nodes != null)
{
    foreach (HtmlNode comment in nodes)
    {
        comment.ParentNode.RemoveChild(comment);
    }
}

回答1:


Check that comment does not start with DOCTYPE

  foreach (var comment in nodes)
  {
     if (!comment.InnerText.StartsWith("DOCTYPE"))
         comment.ParentNode.RemoveChild(comment);
  }



回答2:


doc.DocumentNode.Descendants()
 .Where(n => n.NodeType == HtmlAgilityPack.HtmlNodeType.Comment)
 .ToList()
 .ForEach(n => n.Remove());

this will strip off all comments from the document



来源:https://stackoverflow.com/questions/6567484/how-to-strip-comments-from-html-using-agility-pack-without-losing-doctype

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!