I\'m struggling with a strange file name encoding issue when listing directory contents in Java 6 on both OS X and Linux: the File.listFiles() and related metho
On Unix file-system, a file name really is a null-terminated byte[]. So the java runtime has to perform conversion from java.lang.String to byte[] during the createNewFile() operation. The char-to-byte conversion is governed by the locale. I've been testing setting LC_ALL to en_US.UTF-8 and en_US.ISO-8859-1 and got coherent results. This is with Sun (...Oracle) java 1.6.0_20. However, For LC_ALL=en_US.POSIX, the result is:
File name: Tr%C3%AEcky+N%C3%A5me
Listed name: Tr%3Fcky+N%3Fme
3F is a question mark. It tells me that the conversion was not successful for the non-ASCII character. Then again, everything is as expected.
But the reason why your two strings are different is because of the equivalence between the \u00EE character (or C3 AE in UTF-8) and the sequence i+\u0302 (69 CC 82 in UTF-8). \u0302 is a combining diacritical mark (combining circumflex accent). Some sort of normalization occurred during the file creation. I'm not sure if it's done in the Java run-time or the OS.
NOTE: I took me some time to figure it out since the code snippet that you've posted do not have a combining diacritical mark but the equivalent character î (e.g. \u00ee). You should have embedded the Unicode escape sequence in the string literal (but it's easy to say that afterward...).