I\'m using a framwork which returns malformed Strings with \"empty\" characters from time to time.
\"foobar\" for example is represented by: [,f,o,o,b,a,r]
T
Regex would be an appropriate way to sanitize the string from unwanted Unicode characters in this case.
String sanitized = dirty.replaceAll("[\uFEFF-\uFFFF]", "");
This will replace all char in \uFEFF-\uFFFF range with the empty string.
The [...] construct is called a character class, e.g. [aeiou] matches one of any of the lowercase vowels, [^aeiou] matches anything but.
You can do one of these two approaches:
replaceAll("[blacklist]", "")replaceAll("[^whitelist]", "")