How Can I strip HTML from Text in .NET?

后端 未结 9 1809
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-12-16 02:20

I have an asp.net web page that has a TinyMCE box. Users can format text and send the HTML to be stored in a database.

On the server, I would like to take strip the h

相关标签:
9条回答
  • 2020-12-16 02:57

    If you are just storing text for indexing then you probably want to do a bit more than just remove the HTML, such as ignoring stop-words and removing words shorter than (say) 3 characters. However, a simple tag and stripper I once wrote goes something like this:

        public static string StripTags(string value)
        {
            if (value == null)
                return string.Empty;
    
            string pattern = @"&.{1,8};";
            value = Regex.Replace(value, pattern, " ");
            pattern = @"<(.|\n)*?>";
            return Regex.Replace(value, pattern, string.Empty);
        }
    

    It's old and I'm sure it can be optimised (perhaps using a compiled reg-ex?). But it does work and may help...

    0 讨论(0)
  • 2020-12-16 02:59

    You can use something like this

    string strwithouthtmltag;    
    strwithouthtmltag = Regex.Replace(strWithHTMLTags, "<[^>]*>", string.Empty)
    
    0 讨论(0)
  • 2020-12-16 03:03
    TextReader tr = new StreamReader(@"Filepath");
    string str = tr.ReadToEnd();     
    str= Regex.Replace(str,"<(.|\n)*?>", string.Empty);
    

    but you need to have a namespace referenced i.e:

    system.text.RegularExpressions
    

    only take this logic for your website

    0 讨论(0)
提交回复
热议问题