Java Scanner(File) misbehaving, but Scanner(FIleInputStream) always works with the same file

不羁的心 提交于 2019-11-27 02:41:53

问题


I am having weird behavior with Scanner. It will work with a particular set of files I am using when I use the Scanner(FileInputStream) constructor, but it won't with the Scanner(File) constructor.

Case 1: Scanner(File)

Scanner s = new Scanner(new File("file"));
while(s.hasNextLine()) {
    System.out.println(s.nextLine());
}

Result: no output

Case 2: Scanner(FileInputStream)

Scanner s = new Scanner(new FileInputStream(new File("file")));
while(s.hasNextLine()) {
    System.out.println(s.nextLine());
}

Result: the file content outputs to the console.

The input file is a java file containing a single class.

I double checked programmatically (in Java) that:

  • the file exists,
  • is readable,
  • and has a non-zero filesize.

Typically Scanner(File) works for me in this case, I am not sure why it doesn't now.


回答1:


hasNextLine() calls findWithinHorizon() which in turns calls findPatternInBuffer(), searching a match for a line terminator character pattern defined as .*(\r\n|[\n\r\u2028\u2029\u0085])|.+$

Strange thing is that with both ways to construct a Scanner (with FileInputStream or via File), findPatternInBuffer returns a positive match if the file contains (independently from file size) for instance the 0x0A line terminator; but in the case the file contains a character out of ascii (ie >= 7f), using FileInputStream returns true while using File returns false.

Very simple test case:

create a file which contains just char "a"

# hexedit file     
00000000   61 0A                                                a.

# java Test.java
using File: true
using FileInputStream: true

now edit the file with hexedit to:

# hexedit file
00000000   61 0A 80                                             a..

# java Test.java
using File: false
using FileInputStream: true

in the test java code there is nothing else than what already in the question:

import java.io.*;
import java.lang.*;
import java.util.*;
public class Test {
    public static void main(String[] args) {
        try {
                File file1 = new File("file");
                Scanner s1 = new Scanner(file1);
                System.out.println("using File: "+s1.hasNextLine());
                File file2 = new File("file");
                Scanner s2 = new Scanner(new FileInputStream(file2));
                System.out.println("using FileInputStream: "+s2.hasNextLine());
        } catch (IOException e) {
                e.printStackTrace();
        }
    }
}

SO, it turns out this is a charset issue. In facts, changing the test to:

 Scanner s1 = new Scanner(file1, "latin1");

we get:

# java Test 
using File: true
using FileInputStream: true



回答2:


From looking at the Oracle/Sun JDK's 1.6.0_23 implementation of Scanner, the Scanner(File) constructor invokes a FileInputStream, which is meant for raw binary data.

This points to a difference in buffering and parsing technique used when invoking one constructor or another, which will directly impact your code on the call to hasNextLine().

Scanner(InputStream) uses an InputStreamReader while Scanner(File) uses an InputStream passed to a ByteChannel (and probably reads the whole file in one jump, thus advancing the cursor, in your case).



来源:https://stackoverflow.com/questions/9492520/java-scannerfile-misbehaving-but-scannerfileinputstream-always-works-with-t

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!