Grab all text from html with Html Agility Pack

后端 未结 6 814
执念已碎
执念已碎 2020-11-28 10:11

Input

foo bar baz

O

6条回答
  •  栀梦
    栀梦 (楼主)
    2020-11-28 10:36

    I was in the need of a solution that extracts all text but discards the content of script and style tags. I could not find it anywhere, but I came up with the following which suits my own needs:

    StringBuilder sb = new StringBuilder();
    IEnumerable nodes = doc.DocumentNode.Descendants().Where( n => 
        n.NodeType == HtmlNodeType.Text &&
        n.ParentNode.Name != "script" &&
        n.ParentNode.Name != "style");
    foreach (HtmlNode node in nodes) {
        Console.WriteLine(node.InnerText);
    

提交回复
热议问题