How Can I strip HTML from Text in .NET?

后端 未结 9 1808
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-12-16 02:20

I have an asp.net web page that has a TinyMCE box. Users can format text and send the HTML to be stored in a database.

On the server, I would like to take strip the h

相关标签:
9条回答
  • 2020-12-16 02:40

    I downloaded the HtmlAgilityPack and created this function:

    string StripHtml(string html)
    {
        // create whitespace between html elements, so that words do not run together
        html = html.Replace(">","> ");
    
        // parse html
        var doc = new HtmlAgilityPack.HtmlDocument();   
        doc.LoadHtml(html);
    
        // strip html decoded text from html
        string text = HttpUtility.HtmlDecode(doc.DocumentNode.InnerText);   
    
        // replace all whitespace with a single space and remove leading and trailing whitespace
        return Regex.Replace(text, @"\s+", " ").Trim();
    }
    
    0 讨论(0)
  • 2020-12-16 02:41

    You can use HTQL COM, and query the source with a query: <body> &tx;

    0 讨论(0)
  • 2020-12-16 02:51

    Take a look at this Strip HTML tags from a string using regular expressions

    0 讨论(0)
  • 2020-12-16 02:52

    You could:

    • Use a plain old TEXTAREA (styled for height/width/font/etc.) rather than TinyMCE.
    • Use TinyMCE's built-in configuration options for stripping unwanted HTML.
    • Use HtmlDecode(RegEx.Replace(mystring, "<[^>]+>", "")) on the server.
    0 讨论(0)
  • 2020-12-16 02:52

    As you may have malformed HTML in the system: BeautifulSoup or similar could be used.

    It is written in Python; I am not sure how it could be interfaced - using the .NET language IronPython?

    0 讨论(0)
  • 2020-12-16 02:57

    Here's Jeff Atwood's RefactorMe code link for his Sanitize HTML method

    0 讨论(0)
提交回复
热议问题