I am trying to write a regular expression in C# to remove all script tags and anything contained within them.
So far I have come up with the following: \\<(
You can't parse HTML with regular expressions.
Use the HTML Agility Pack instead.
This regular expression does the trick just fine:
\<(?:[^:]+:)?script\>.*?\<\/(?:[^:]+:)?script\>
You will run into a problem by this simple HTML:
<script>
var s = "<script></script>";
</script>
How are you going to solve this problem? It is smarter to use the HTML Agility Pack for such things.