How to identify/handle text file newlines in Java?

夙愿已清 提交于 2019-12-06 09:03:39

问题


I get files in different formats coming from different systems that I need to import into our database. Part of the import process it to check the line length to make sure the format is correct. We seem to be having issues with files coming from UNIX systems where one character is added. I suspect this is due to the return carriage being encoded differently on UNIX and windows platform.

Is there a way to detect on which file system a file was created, other than checking the last character on the line? Or maybe a way of reading the files as text and not binary which I suspect is the issue?

Thanks Guys !


回答1:


Unix systems use \n line endings while windows uses \r\n and mac uses \r. You cannot detect the file system since it doesn't matter at all. I can use \n on windows if my editor supports it for example. It's just the standard on those OS, not a requirement.

The proper way - assuming you don't have a function which properly tokenizes no matter what line ending the file uses - is to search for a \n OR a \r and then end the current line and strip all chars from the remaining data which are either \r or \n before you begin the next line. However, this will cause issues if you have blank lines and need to keep them. In this case you have to look at linebreaks more carefully:

  • when reading a \n, end the current line and start the next line
  • when reading a \r, end the current line and, if the next char is \n, skip it, and start the next line, otherwise start the new line immediately.



回答2:


Most of the time Java will handle differing types of line endings automatically, silently parsing \n (unix) \r\n (windows) and \r (mac) without bothering you (as long as you're using a character stream). See the docs for java.io.FileReader and friends. Using a character stream will also handle all of the possible Unicode encoding schemes.

If you want to read the line separators explicitly, you'll need to read the file as a byte stream. See the docs for java.io.DataInputStream and friends.




回答3:


Is there a way to detect on which file system a file was created, other than checking the last character on the line?

No. And even checking the line termination sequence is only a hint. We can easily create files with DOS line termination on UNIX, and vice versa.

Or maybe a way of reading the files as text and not binary which I suspect is the issue?

Yes. Open the file using a file reader, wrap it in a buffered reader, and use the readLine() method to read the file a line at a time. This method recognizes a "\n", "\r" or "\r\n" as a line separator, and hence works for DOS, UNIX and Mac files.

Here's some typical code:

    Reader r = new FileReader("somefile");
    try {
        BufferedReader br = new BufferedReader(r);
        String line;
        while ((line = r.readLine()) != null) {
            // process line
        }
    } finally {
        r.close();
    }


来源:https://stackoverflow.com/questions/3022407/how-to-identify-handle-text-file-newlines-in-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!