HTML agility pack - removing unwanted tags without removing content?

前端 未结 5 2238
忘了有多久
忘了有多久 2020-11-29 03:00

I\'ve seen a few related questions out here, but they don’t exactly talk about the same problem I am facing.

I want to use the HTML Agility Pack to remove unwa

5条回答
  •  暗喜
    暗喜 (楼主)
    2020-11-29 03:37

    Try the following, you might find it a bit neater than the other proposed solutions:

    public static int RemoveNodesButKeepChildren(this HtmlNode rootNode, string xPath)
    {
        HtmlNodeCollection nodes = rootNode.SelectNodes(xPath);
        if (nodes == null)
            return 0;
        foreach (HtmlNode node in nodes)
            node.RemoveButKeepChildren();
        return nodes.Count;
    }
    
    public static void RemoveButKeepChildren(this HtmlNode node)
    {
        foreach (HtmlNode child in node.ChildNodes)
            node.ParentNode.InsertBefore(child, node);
        node.Remove();
    }
    
    public static bool TestYourSpecificExample()
    {
        string html = "

    my paragraph

    and my div
    are italic and bold

    "; HtmlDocument document = new HtmlDocument(); document.LoadHtml(html); document.DocumentNode.RemoveNodesButKeepChildren("//div"); document.DocumentNode.RemoveNodesButKeepChildren("//p"); return document.DocumentNode.InnerHtml == "my paragraph and my div are italic and bold"; }

提交回复
热议问题