I\'m making a cross-platform application that renames files based on data retrieved online. I\'d like to sanitize the Strings I took from a web API for the current platform.
This is based on the accepted answer by Sarel Botha which works fine as long as you don't encounter any characters outside of the Basic Multilingual Plane. If you need full Unicode support (and who doesn't?) use this code instead which is Unicode safe:
public class FileNameCleaner {
final static int[] illegalChars = {34, 60, 62, 124, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 58, 42, 63, 92, 47};
static {
Arrays.sort(illegalChars);
}
public static String cleanFileName(String badFileName) {
StringBuilder cleanName = new StringBuilder();
int len = badFileName.codePointCount(0, badFileName.length());
for (int i=0; i
Key changes here:
length instead of just lengthcharAtappendchars to ints. In fact, you should never deal with chars as they are basically broken for anything outside the BMP.