Maven: Source Encoding in UTF-8 not working?

后端 未结 5 615
南旧
南旧 2020-12-13 03:38

i am converting a project from Ant to Maven and i\'m having problems with a specific unit test which deals with UTF-8 characters. The problem is about the following String:<

5条回答
  •  無奈伤痛
    2020-12-13 04:28

    1. When debugging Unicode problems, make sure you convert everything to ASCII so you can read and understand what is inside of a String without guesswork. This means you should use, for example, StringEscapeUtils from commons-lang3 to turn ä into \u00e4. That way, you can be sure that you see ? because the console can't print it. And you can distinguish " " (\u0020) from " " (\u00a0)

      In the test case, check the escaped version of the inputs as early as possible to make sure the data is actually what you expect.

      So the code above should be:

      assertEquals("\u010d\u00e4\u....", escape(l_string));
      
    2. Make sure you use the correct encoding for file I/O. Never use the default encoding of Java, always use InputStreamReader/OutputStreamWriter and specify the encoding to use.

    3. The POM looks correct. Run mvn with -X to make sure it picks up the correct options and runs the Java compiler using the correct options. mvn help:effective-pom might also help.

    4. Disassemble the class file to check the strings. Java will use ? to denote that it couldn't read something.

      If you get the ? from System.out.println( ">>> " + l_string );, this means the code wasn't compiled with UTF-8 or that the source file was maybe saved with another Unicode encoding (UTF-16 or similar).

      Another source of problems could be the properties file. Make sure it was saved with ISO-8859-1 and that it wasn't modified by the compilation process.

    5. Make sure Maven actually compiles your file. Use mvn clean to force a full-recompile.

提交回复
热议问题