How do I remove strange and unwanted Unicode characters (such as a black diamond with question mark) from a String?
Updated:
Please tell me the Unicode chara
Most probably the text that you got was encoded in something other than UTF-8. What you could do is to not allow text with other encodings (for example Latin-1) to be uploaded:
try {
CharsetDecoder charsetDecoder = StandardCharsets.UTF_8.newDecoder();
charsetDecoder.onMalformedInput(CodingErrorAction.REPORT);
return IOUtils.toString(new InputStreamReader(new FileInputStream(filePath), charsetDecoder));
}
catch (MalformedInputException e) {
// throw an exception saying the file was not saved with UTF-8 encoding.
}