i am converting a project from Ant to Maven and i\'m having problems with a specific unit test which deals with UTF-8 characters. The problem is about the following String:<
When debugging Unicode problems, make sure you convert everything to ASCII so you can read and understand what is inside of a String without guesswork. This means you should use, for example, StringEscapeUtils
from commons-lang3 to turn ä
into \u00e4
. That way, you can be sure that you see ?
because the console can't print it. And you can distinguish " " (\u0020
) from " " (\u00a0
)
In the test case, check the escaped version of the inputs as early as possible to make sure the data is actually what you expect.
So the code above should be:
assertEquals("\u010d\u00e4\u....", escape(l_string));
Make sure you use the correct encoding for file I/O. Never use the default encoding of Java, always use InputStreamReader
/OutputStreamWriter
and specify the encoding to use.
The POM looks correct. Run mvn
with -X
to make sure it picks up the correct options and runs the Java compiler using the correct options. mvn help:effective-pom
might also help.
Disassemble the class file to check the strings. Java will use ?
to denote that it couldn't read something.
If you get the ?
from System.out.println( ">>> " + l_string );
, this means the code wasn't compiled with UTF-8 or that the source file was maybe saved with another Unicode encoding (UTF-16 or similar).
Another source of problems could be the properties file. Make sure it was saved with ISO-8859-1 and that it wasn't modified by the compilation process.
Make sure Maven actually compiles your file. Use mvn clean
to force a full-recompile.