I\'m working on a web application that allows users to type short descriptions of items in a catalog. I\'m allowing Markdown in my textareas so users can do some HTML format
I think stripping any HTML tag from the input will get you something pretty secure -- except if someone find a way to inject some really messed up data into Markdown, having it generate some even more messed-up output ^^
Still, here are two things that come to my mind :
First one : strip_tags is not a miracle function : it has some flaws...
For instance, it'll strip everything after the '<', in a situation like this one :
$str = "10 appels is
The output I get is :
string '10 appels is ' (length=13)
Which is not that nice for your users :-(
Second one : One day or another, you might want to allow some HTML tags/attributes ; or, even today, you might want to be sure that Markdown doesn't generate some HTML Tags/attributes.
You might be interested by something like HTMLPurifier : it allows you to specify which tags and attributes should be kept, and filters a string, so that only those remain.
It also generates valid HTML code -- which is always nice ;-)