How do I find if a string contains HTML data or not? The user provides input via web interface and it\'s quite possible he could have used either a simple text or used HTML
I'm using regex:
[\S\s]*\<html[\S\s]*\>[\S\s]*\<\/html[\S\s]*\>[\S\s]*
So in JAVA it looks like:
text.matches("[\\S\\s]*\\<html[\\S\\s]*\>[\\S\\s]*\\<\\/html[\\S\\s]*\\>[\S\s]*");
It should match any correct (as well as some incorrect) XML file that contains somewhere an "html" element. So there might be false positives.
Edit:
Since I have posted that, I have removed the last part with html element closing, as I found some websites don't use it. (?!) So in case, you prefer false positives to false negatives, I encourage to do that!