Maven UTF-8 encoding issue

强颜欢笑 提交于 2020-07-06 13:23:31

问题


When I run below code with two different project I get different outputs.

    String myString = "Türkçe Karakter Testi : ğüşiöçĞÜİŞÇÖĞ";
    String value = new String(myString.getBytes("UTF-8"));
    System.out.println(value);

First project is non-maven java application created in Netbeans 8.2. And it gives me following result which i expect.

"Türkçe Karakter Testi : ğüşiöçĞÜİŞÇÖĞ"

And second project is maven java application project which is created in same way with following pom.xml file:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.mycompany</groupId>
    <artifactId>mavenproject1</artifactId>
    <version>1.0-SNAPSHOT</version>
    <packaging>jar</packaging>
    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
    </properties>
</project>

This project gives me:

"Türkçe Karakter Testi : ğüşiöçÄ?ÜİÅ?ÇÖÄ?"

I checked both file with notepad++ and both of them are encoded with UTF-8


回答1:


You're missing the encoding from your new String() constructor, so it's using the default encoding of your platform which isn't UTF-8 (looks like some variant of ISO-8859-1).

If you use the following code (which doesn't make much sense, but shows the default encoding botching things), you'll see that it's printed properly everywhere.

String myString = "Türkçe Karakter Testi : ğüşiöçĞÜİŞÇÖĞ";
String value = new String(myString.getBytes("UTF-8"), "UTF-8");
System.out.println(value);

What's the lesson here? Always specify the encoding to use when dealing with byte/character conversion! This includes such methods as String.getBytes(), new String() and new InputStreamReader().

This is just one of the many ways that character encoding can bite you in the behind. It may seem like a simple problem, but it catches unsuspecting developers all the time.




回答2:


I also often faced with the same problems.


Configuring Maven Character Encoding

The problem

  • Run my code in IDE (idea/eclipse). All correct. Output had correct encoding and in the console and in output files.

  • Run my app after built Maven. When I try to run my App (jar) built with help maven mvn clean install I got incorrect values in output related to incorrect encoding. In the console and in output files which were generated in my app I saw incorrect and unexpected symbols

  • Warning in your console. This warning means that you have not set any character encoding for your project/environment. Let's solve this problem. There are a couple of options you can consider.

[WARNING] File encoding has not been set, using platform encoding UTF-8, i.e. build is platform dependent!

Configuring Maven Character Encoding

1. Properties

A most popular and common way to set Maven Character Encoding is to use properties. These properties are supported by most plugins. These properties are easy to add. Just add them as a child element of the project element.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
                             http://maven.apache.org/xsd/maven-4.0.0.xsd">
    [...]
    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
    </properties>
    [...]
</project>

2. Maven Resources Plugin

You can also specify Maven Character Encoding using the maven resources plugin.

The only drawback is that you have to include this plugin to your Maven pom.xml file.

JUST ADD THIS PLUGIN - It`s always helped me ))

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
                             http://maven.apache.org/xsd/maven-4.0.0.xsd">
    [...]
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-resources-plugin</artifactId>
                <configuration>
                    <encoding>UTF-8</encoding>
                </configuration>
            </plugin>
        </plugins>
    </build>
    [...]
</project>

3. Commandline

If you cannot alter the source code of a maven project, or you need to specify maven character encoding on a built server like Jenkins, Hudson, or Bamboo you can also add the encoding through the command line.

mvn -Dproject.build.sourceEncoding=UTF-8 -Dproject.reporting.outputEncoding=UTF-8 clean deploy

4. Maven Options

If you do a lot of small projects for personal gain you can also set this property globally in MAVEN_OPTS. The only drawback is that if you share your code base with another developer then the developer also has to add these MAVEN_OPTS. That’s why I do not recommend it.

set MAVEN_OPTS= -Dfile.encoding="UTF-8"

@See How to Configure Maven Character Encoding



来源:https://stackoverflow.com/questions/48278786/maven-utf-8-encoding-issue

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!