Jeff actually posted about this in Sanitize HTML. But his example is in C# and I\'m actually more interested in a Java version. Does anyone have a better version for Java? I
[\s\w\.]*. If it doesn't match, you've got XSS. Maybe. Take note that this expression only allows letters, numbers, and periods. It avoids all symbols, even useful ones, out of fear of XSS. Once you allow &, you've got worries. And merely replacing all instances of & with & is not sufficient. Too complicated to trust :P. Obviously this will disallow a lot of legitimate text (You can just replace all nonmatching characters with a ! or something), but I think it will kill XSS.
The idea to just parse it as html and generate new html is probably better.