Grab all text from html with Html Agility Pack

后端 未结 6 820
执念已碎
执念已碎 2020-11-28 10:11

Input

foo bar baz

O

6条回答
  •  心在旅途
    2020-11-28 10:31

    https://github.com/jamietre/CsQuery

    have you tried CsQuery? Though not being maintained actively - it's still my favorite for parsing HTML to Text. Here's a one liner of how simple it is to get the Text from HTML.

    var text = CQ.CreateDocument(htmlText).Text();
    

    Here's a complete console application:

    using System;
    using CsQuery;
    
    public class Program
    {
        public static void Main()
        {
            var html = "

    Hello World

    some text inside h1 tag under p tag

    "; var text = CQ.CreateDocument(html).Text(); Console.WriteLine(text); // Output: Hello World some text inside h1 tag under p tag } }

    I understand that OP has asked for HtmlAgilityPack only but CsQuery is another unpopular and one of the best solutions I've found and wanted to share if someone finds this helpful. Cheers!

提交回复
热议问题