I have an asp.net web page that has a TinyMCE box. Users can format text and send the HTML to be stored in a database.
On the server, I would like to take strip the h
If you are just storing text for indexing then you probably want to do a bit more than just remove the HTML, such as ignoring stop-words and removing words shorter than (say) 3 characters. However, a simple tag and stripper I once wrote goes something like this:
public static string StripTags(string value)
{
if (value == null)
return string.Empty;
string pattern = @"&.{1,8};";
value = Regex.Replace(value, pattern, " ");
pattern = @"<(.|\n)*?>";
return Regex.Replace(value, pattern, string.Empty);
}
It's old and I'm sure it can be optimised (perhaps using a compiled reg-ex?). But it does work and may help...
You can use something like this
string strwithouthtmltag;
strwithouthtmltag = Regex.Replace(strWithHTMLTags, "<[^>]*>", string.Empty)
TextReader tr = new StreamReader(@"Filepath");
string str = tr.ReadToEnd();
str= Regex.Replace(str,"<(.|\n)*?>", string.Empty);
but you need to have a namespace referenced i.e:
system.text.RegularExpressions
only take this logic for your website