Java console not reading in Chinese characters correctly

a 夏天 提交于 2020-01-01 09:15:18

问题


I am struggling to get Eclipse to read in Chinese characters correctly, and I am not sure where I may be going wrong.

Specifically, somewhere between reading in a string of Chinese (simplified or traditional) from the console and outputting it, it gets garbled. Even when outputting a large string of mixed text (English/Chinese characters), it appears to only alter the appearance of the Chinese characters.

I have cut it down to the following test example and explicitly annotated it with what I believe is happening at each stage - note that I am a student and would very much like to confirm my understanding (or otherwise) :)

public static void main(String[] args) {    
    try 
    {
        boolean isRunning = true;

        //Raw flow of input data from the console
        InputStream inputStream = System.in;
        //Allows you to read the stream, using either the default character encoding, else the specified encoding;
        InputStreamReader inputStreamReader = new InputStreamReader(inputStream, "UTF-8");
        //Adds functionality for converting the stream being read in, into Strings(?)
        BufferedReader input_BufferedReader = new BufferedReader(inputStreamReader);


        //Raw flow of outputdata to the console
        OutputStream outputStream = System.out;
        //Write a stream, from a given bit of text
        OutputStreamWriter outputStreamWriter = new OutputStreamWriter(outputStream, "UTF-8");
        //Adds functionality to the base ability to write to a stream
        BufferedWriter output_BufferedWriter = new BufferedWriter(outputStreamWriter);



        while(isRunning) {
            System.out.println();//force extra newline
            System.out.print("> ");

            //To read in a line of text (as a String):
            String userInput_asString = input_BufferedReader.readLine();

            //To output a line of text:
            String outputToUser_fromString_englishFromCode = "foo"; //outputs correctly
            output_BufferedWriter.write(outputToUser_fromString_englishFromCode);
            output_BufferedWriter.flush();

            System.out.println();//force extra newline

            String outputToUser_fromString_ChineseFromCode = "之謂甚"; //outputs correctly
            output_BufferedWriter.write(outputToUser_fromString_ChineseFromCode);
            output_BufferedWriter.flush();

            System.out.println();//force extra newline

            String outputToUser_fromString_userSupplied = userInput_asString; //outputs correctly when given English text, garbled when given Chinese text
            output_BufferedWriter.write(outputToUser_fromString_userSupplied);
            output_BufferedWriter.flush();

            System.out.println();//force extra newline

        }
    }
    catch (Exception e) {
        // TODO: handle exception
    }
}

Sample output:

> 之謂甚
foo
之謂甚
之謂甚

> oaea
foo
之謂甚
oaea

> mixed input - English: fubar; Chinese: 之謂甚;
foo
之謂甚
mixed input - English: fubar; Chinese: 之謂甚;

> 

What is seen on this Stack Overflow post matches exactly what I see in the Eclipse console and what is seen within the Eclipse debugger (when viewing/editing the variable values). Altering the variable values manually via the Eclipse debugger results in the code depending on that value to behave as I would normally expect them to, suggesting that it is how the text is read IN that is an issue.

I have tried many different combinations of scanners/buffered stream [reader|writer]s etc to read in and output, with and without explicit character types though this wasn't done particularly systematically and could easily have missed something.

I have tried to set the Eclipse environment to use UTF-8 wherever possible, but I guess I could have missed a place or two.. Note that the console will correctly output hard-coded Chinese characters.

Any assistance / guidance on this matter is greatly appreciated :)


回答1:


It looks like the console is not reading the input correctly. Here is a link that I believe describes your problem and work-rounds.

http://paranoid-engineering.blogspot.com/2008/05/getting-unicode-output-in-eclipse.html

Simple Answer : Try setting the environmental variable -Dfile.encoding=UTF-8 in your eclipse.ini. (Before enabling this for whole of eclipse, you could just try setting this in the debug configurtion for this program and see if it works )

The link has lot more suggestions




回答2:


Try this: In eclipse, right click your main class and click run as > run configurations. Then go to the common tab and change the encoding to UTF-8. That should work!




回答3:


This seems to be an encoding problem. There might be two problems here: 1. You haven't activated the compilers ability to read anything but ASCII characters, in your case you need to be able to read UTF-8 characters. 2. You may have deleted certain language packs? This is unlikely since you probably are able to write Chinese characters?

You should search around and learn how you can your IDE to compile the non-ASCII characters correctly. In python this is done in the code itself, I'm unsure how it is done in Java.



来源:https://stackoverflow.com/questions/13882378/java-console-not-reading-in-chinese-characters-correctly

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!