I\'m using a framwork which returns malformed Strings with \"empty\" characters from time to time.
\"foobar\" for example is represented by: [,f,o,o,b,a,r]
T
Regex would be an appropriate way to sanitize the string from unwanted Unicode characters in this case.
String sanitized = dirty.replaceAll("[\uFEFF-\uFFFF]", "");
This will replace all char
in \uFEFF-\uFFFF
range with the empty string.
The [...]
construct is called a character class, e.g. [aeiou]
matches one of any of the lowercase vowels, [^aeiou]
matches anything but.
You can do one of these two approaches:
replaceAll("[
blacklist
]", "")
replaceAll("[^
whitelist
]", "")
trim left or right removes white spaces. does it has a colon before space?
even more: a=(long) string[0]; will show u the char code, and u can use replace() or substring.
Thank you Johannes Rössel. It actually was '\uFEFF'
The following code works:
final StringBuilder sb = new StringBuilder();
for (final char character : body.toCharArray()) {
if (character != '\uFEFF') {
sb.append(character);
}
}
final String sanitzedString = sb.toString();
Anyone know of a way to just include a range of valid characters instead of excluding 95% of the UTF8 range?