发表新帖

发表新帖

How Can I strip HTML from Text in .NET?

后端未结

关注

 9  1825

爱一瞬间的悲伤

I have an asp.net web page that has a TinyMCE box. Users can format text and send the HTML to be stored in a database.

On the server, I would like to take strip the h

相关标签:

9条回答

挽巷

2020-12-16 02:57
If you are just storing text for indexing then you probably want to do a bit more than just remove the HTML, such as ignoring stop-words and removing words shorter than (say) 3 characters. However, a simple tag and stripper I once wrote goes something like this:
```
    public static string StripTags(string value)
    {
        if (value == null)
            return string.Empty;

        string pattern = @"&.{1,8};";
        value = Regex.Replace(value, pattern, " ");
        pattern = @"<(.|\n)*?>";
        return Regex.Replace(value, pattern, string.Empty);
    }
```
It's old and I'm sure it can be optimised (perhaps using a compiled reg-ex?). But it does work and may help...
0 讨论(0)
发布评论:

提交评论
- 加载中...

2020-12-16 02:59

You can use something like this

string strwithouthtmltag;    
strwithouthtmltag = Regex.Replace(strWithHTMLTags, "<[^>]*>", string.Empty)

0 讨论(0)

情书的邮戳

2020-12-16 03:03

TextReader tr = new StreamReader(@"Filepath");
string str = tr.ReadToEnd();     
str= Regex.Replace(str,"<(.|\n)*?>", string.Empty);

but you need to have a namespace referenced i.e:

system.text.RegularExpressions

only take this logic for your website

0 讨论(0)

上一页 1 2

热议问题