Java App : Unable to read iso-8859-1 encoded file correctly

后端 未结 5 872
一个人的身影
一个人的身影 2020-12-16 04:27

I have a file which is encoded as iso-8859-1, and contains characters such as ô .

I am reading this file with java code, something like:

File in = n         


        
5条回答
  •  野趣味
    野趣味 (楼主)
    2020-12-16 04:51

    Parsing the file as fixed-size blocks of bytes is not good --- what if some character has a byte representation that straddles across two blocks? Use an InputStreamReader with the appropriate character encoding instead:

     BufferedReader br = new BufferedReader(
             new InputStreamReader(
             new FileInputStream("myfile.csv"), "ISO-8859-1");
    
     char[] buffer = new char[4096]; // character (not byte) buffer 
    
     while (true)
     {
          int charCount = br.read(buffer, 0, buffer.length);
    
          if (charCount == -1) break; // reached end-of-stream 
    
          String s = String.valueOf(buffer, 0, charCount);
          // alternatively, we can append to a StringBuilder
    
          System.out.println(s);
     }
    

    Btw, remember to check that the unicode character can indeed be displayed correctly. You could also redirect the program output to a file and then compare it with the original file.

    As Jon Skeet suggests, the problem may also be console-related. Try System.console().printf(s) to see if there is a difference.

提交回复
热议问题