I\'m using a framwork which returns malformed Strings with \"empty\" characters from time to time.
\"foobar\" for example is represented by: [,f,o,o,b,a,r]
T
A very simple way to remove the UTF-8 BOM from a string, using substring as Denis Tulskiy suggested. No looping needed. Just checks the first character for the mark and skips it if needed.
public static String removeUTF8BOM(String s) {
if (s.startsWith("\uFEFF")) {
s = s.substring(1);
}
return s;
}
I needed to add this to my code when using the Apache HTTPClient EntityUtil to read from a webserver. The webserver was not sending the blank mark but it was getting pulled in while reading the input stream. Original article can be found here.