How Can I strip HTML from Text in .NET?

后端 未结 9 1828
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-12-16 02:20

I have an asp.net web page that has a TinyMCE box. Users can format text and send the HTML to be stored in a database.

On the server, I would like to take strip the h

9条回答
  •  一个人的身影
    2020-12-16 02:40

    I downloaded the HtmlAgilityPack and created this function:

    string StripHtml(string html)
    {
        // create whitespace between html elements, so that words do not run together
        html = html.Replace(">","> ");
    
        // parse html
        var doc = new HtmlAgilityPack.HtmlDocument();   
        doc.LoadHtml(html);
    
        // strip html decoded text from html
        string text = HttpUtility.HtmlDecode(doc.DocumentNode.InnerText);   
    
        // replace all whitespace with a single space and remove leading and trailing whitespace
        return Regex.Replace(text, @"\s+", " ").Trim();
    }
    

提交回复
热议问题