Most efficient way to check if a file is empty in Java on Windows

后端 未结 12 1216
南笙
南笙 2020-12-05 04:30

I am trying to check if a log file is empty (meaning no errors) or not, in Java, on Windows. I have tried using 2 methods so far.

Method 1 (Failure)

12条回答
  •  慢半拍i
    慢半拍i (楼主)
    2020-12-05 05:18

    Now both these methods fail at times when the log file is empty (has no content), yet the file size is not zero (2 bytes).

    Actually, I think you will find that the file is NOT empty. Rather I think that you will find that those two characters are a CR and a NL; i.e. the file consists of one line that is empty.

    If you want to test if a file is either empty or has a single empty line then a simple, relatively efficient way is:

    try (BufferedReader br = new BufferedReader(FileReader(fileName))) {
        String line = br.readLine();
        if (line == null || 
            (line.length() == 0 && br.readLine() == null)) {
            System.out.println("NO ERRORS!");
        } else {
            System.out.println("SOME ERRORS!");
        }
    }
    

    Can we do this more efficiently? Possibly. It depends on how often you have to deal with the three different cases:

    • a completely empty file
    • a file consisting of a single empty line
    • a file with a non-empty line, or multiple lines.

    You can probably do better by using Files.length() and / or reading just the first two bytes. However, the problems include:

    • If you both test the file size AND read the first few bytes then you are making 2 syscalls.
    • The actual line termination sequence could be CR, NL or CR NL, depending on the platform. (I know you say this is for Windows, but what happens if you need to port your application? Or if someone sends you a non-Windows file?)
    • It would be nice to avoid setting up stream / reader stack, but the file's character encoding could map CR and NL to something other than the bytes 0x0d and 0x0a. (For example ... UTF-16)
    • Then there's the annoying habit of some Windows utilities have putting BOM markers into UTF-8 encoded files. (This would even mess up the simple version above!)

    All of this means that the most efficient possible solution is going to be rather complicated.

提交回复
热议问题