Byte order mark screws up file reading in Java

后端 未结 9 2660
说谎
说谎 2020-11-22 02:55

I\'m trying to read CSV files using Java. Some of the files may have a byte order mark in the beginning, but not all. When present, the byte order gets read along with the r

9条回答
  •  面向向阳花
    2020-11-22 03:49

    I had the same problem, and because I wasn't reading in a bunch of files I did a simpler solution. I think my encoding was UTF-8 because when I printed out the offending character with the help of this page: Get unicode value of a character I found that it was \ufeff. I used the code System.out.println( "\\u" + Integer.toHexString(str.charAt(0) | 0x10000).substring(1) ); to print out the offending unicode value.

    Once I had the offending unicode value, I replaced it in the first line of my file before I went on reading. The business logic of that section:

    String str = reader.readLine().trim();
    str = str.replace("\ufeff", "");
    

    This fixed my problem. Then I was able to go on processing the file with no issue. I added on trim() just in case of leading or trailing whitespace, you can do that or not, based on what your specific needs are.

提交回复
热议问题