问题
I had to store the user input text in my database with HTML and CSS
formats.
The case is:
RadEditor ,The user copy the text from MSWord to this editor then i store this text in the database with that format . then when retrieve the data in the report or some label some tags appear wrapping the text !!
I use regular expression to remove all the formats but in vain it succeeds sometimes and not all the time .
private static Regex oClearHtmlScript = new Regex(@"<(.|\n)*?>", RegexOptions.Compiled);
public static string RemoveAllHTMLTags(string sHtml)
{
sHtml = sHtml.Replace(" ", string.Empty);
sHtml = sHtml.Replace(">", ">");
sHtml = sHtml.Replace("<", "<");
sHtml = sHtml.Replace("&", "&");
if (string.IsNullOrEmpty(sHtml))
return string.Empty;
return oClearHtmlScript.Replace(sHtml, string.Empty);
}
I ask How to remove all the format using HTMLAgility or any dependable way to ensure the text is pure ?
Note:
The datatype of this field in the database is Lvarchar
回答1:
This should strip out all html tags from a string.
sHtml = Regex.Replace(sHtml, "<.*?>", "");
回答2:
This post recommonds the following approach (and seems to have been accepted).
Regex.Replace(myHTMLString, @"<p>|</p>|<br>|<br />", "\r\n", );
Regex.Replace(myHTMLString, @"<.+?>", string.Empty);
Given you're still having difficulty could you try instantiating a RadEditor and using the .Text property. Ive not used RadEditor before but I did some digging - could you try something like thisL
RadEditor editor = new RadEditor();
editor.Content = myHTMLString;
string plainText = editor.Text;
This is probably a VERY expensive operation but Id be interested to know if it works!
回答3:
HtmlAgility pack makes working with HTML easy.
HtmlDocument mainDoc = new HtmlDocument();
string htmlString = "<html><body><h1>Test</h1> more text</body></html>"
mainDoc.LoadHtml(htmlString);
string cleanText = mainDoc.DocumentNode.InnerText;
回答4:
See my answer here for how it can be done using the Agility Pack. You may have to change the code a little to not strip out words less than two characters though. Also, line breaks will be removed as well, so you'll be left with one long line of text.
来源:https://stackoverflow.com/questions/16303828/how-to-remove-all-tags-and-get-the-pure-text