MalformedInputException with Files.readAllLines()

倖福魔咒の 提交于 2020-01-01 10:48:33

问题


I was iterating over some files, 5328 to be precise. These files are average XML files with 60-200 lines max. They are first filtered through a simple method isXmlSourceFile that parse the path.

    Files.walk(Paths.get("/home/me/development/projects/myproject"), FileVisitOption.FOLLOW_LINKS)
            .filter(V3TestsGenerator::isXmlTestSourceFile)
            .filter(V3TestsGenerator::fileContainsXmlTag)

The big question is for the second filter, especially the method fileContainsXmlTag. For each file I wanted to detect if a pattern was contained at least once among the lines of it:

private static boolean fileContainsXmlTag(Path path) {
    try {
        return Files.readAllLines(path).stream().anyMatch(line -> PATTERN.matcher(line).find());
    } catch (IOException e) {
        e.printStackTrace();
    }
    return false;
}

For some files I get then this exception

java.nio.charset.MalformedInputException: Input length = 1
at java.nio.charset.CoderResult.throwException(CoderResult.java:281)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:161)
at java.io.BufferedReader.readLine(BufferedReader.java:324)
at java.io.BufferedReader.readLine(BufferedReader.java:389)
at java.nio.file.Files.readAllLines(Files.java:3205)
at java.nio.file.Files.readAllLines(Files.java:3242)

But when I use FileUtiles.readLines() instead of Files.readAllLines everything is getting well.

It's a curiosity question so if someone as a clue of what's going on, it's with pleasure.

Thanks


回答1:


The method Files.readAllLines() assumes that the file you are reading is encoded in UTF-8.

If you get this exception, then the file you are reading is most likely encoded using a different character encoding than UTF-8.

Find out what character encoding is used, and use the other readAllLines method, that allows you to specify the character encoding.

For example, if the files are encoded in ISO-8859-1:

return Files.readAllLines(path, StandardCharsets.ISO_8859_1).stream()... // etc.

The method FileUtiles.readLines() (where does that come from?) probably assumes something else (it probably assumes the files are in the default character encoding of your system, which is something else than UTF-8).



来源:https://stackoverflow.com/questions/38828830/malformedinputexception-with-files-readalllines

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!