What are the advantages and disadvantages of reading an entire file into a single String as opposed to reading it line by line?

拟墨画扇 提交于 2019-12-24 03:14:52

问题


Specifically, my end goal is to store every comma separated word from the file in a List<String> and I was wondering which approach I should take.

Approach 1:

String fileContents = new Scanner(new File("filepath")).useDelimiter("\\Z").next();
List<String> list = Arrays.asList(fileContents.split("\\s*,\\s*"));

Approach 2:

Scanner s = new Scanner(new File("filepath")).useDelimiter(",");
List<String> list = new ArrayList<>();
while (s.hasNext()){
    list.add(s.next());
}
s.close();

回答1:


Approach #1 will read the entire file into memory. This has a couple of performance-related issues:

  • If the file is big that uses a lot of memory.
  • Because of the way that the character's need to be accumulated by the Scanner.next() call, the characters may need to be copied 2 or even 3 times.
  • There are other inefficiencies due to the fact that you are using a general pattern matching engine for a very specific purpose.

Approach #3 (which is Approach #1 with the File reading done better) addresses a lot of the efficiency issues, but you still hold the entire file contents in memory.

Approach #2 is best from memory usage perspective because you don't hold the entire file contents as a single string or buffer1. The performance is also likely to be best because (my intuition says) this approach avoids at least one copy of the characters.

However, if this really matters, you should benchmark the alternatives, bearing in mind 2 things:

  • "Premature optimization" is usually wasted effort. (Or to put it another, the chances are that the performance of this part of your code really doesn't matter. The performance bottleneck is likely somewhere else.)
  • There a lot of pitfalls for writing Java benchmarks that can lead to bogus performance measures and incorrect conclusions.

The other thing to note is that what you are trying to do (create a list of all "words" in order) does not scale. For a large enough input file, the application will run out of heap space. If you anticipate running this on input files larger than 100Mb or so, it may start to become a concern.

The solution may be to convert your processing into something that is more "stream" based ... so that you don't need to have a list of all words in memory.

This is essentially the same problem as the problem with Approach #1.


1 - unless the file is small and fits into the buffer ... and then the whole question is largely moot.




回答2:


If you read the entire file into memory when you don't actually need to you are:

  • wasting time: nothing is processed until you've read the entire file
  • wasting space
  • using a technique that won't scale to large files.

Doing this has nothing to recommend it.




回答3:


Approach 1:

Limit of String's maximum size i.e. a String of max length Integer.MAX_VALUE only is possible or the largest possible array at runtime

Hence, Prefer Approach 2 if it is a very large fie



来源:https://stackoverflow.com/questions/30552098/what-are-the-advantages-and-disadvantages-of-reading-an-entire-file-into-a-singl

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!