Byte order mark screws up file reading in Java

后端未结

关注

 9  2660

说谎 2020-11-22 02:55

I\'m trying to read CSV files using Java. Some of the files may have a byte order mark in the beginning, but not all. When present, the byte order gets read along with the r

9条回答

面向向阳花 (楼主)

2020-11-22 03:49
I had the same problem, and because I wasn't reading in a bunch of files I did a simpler solution. I think my encoding was UTF-8 because when I printed out the offending character with the help of this page: Get unicode value of a character I found that it was \ufeff. I used the code System.out.println( "\\u" + Integer.toHexString(str.charAt(0) | 0x10000).substring(1) ); to print out the offending unicode value.

Once I had the offending unicode value, I replaced it in the first line of my file before I went on reading. The business logic of that section:
```
String str = reader.readLine().trim();
str = str.replace("\ufeff", "");
```
This fixed my problem. Then I was able to go on processing the file with no issue. I added on trim() just in case of leading or trailing whitespace, you can do that or not, based on what your specific needs are.
0 讨论(0)

查看其它9个回答
发布评论:

提交评论
- 加载中...