How to list a 2 million files directory in java without having an “out of memory” exception

后端未结

关注

 15  1797

挽巷

I have to deal with a directory of about 2 million xml\'s to be processed.

I\'ve already solved the processing distributing the work between machines and threads us

相关标签:

15条回答

你的背包

2020-12-06 00:39

In case you can use Java 7 this can be done in this way and you won't have those out of memory problems.

Path path = FileSystems.getDefault().getPath("C:\\path\\with\\lots\\of\\files");
        Files.walkFileTree(path, new FileVisitor<Path>() {
            @Override
            public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) throws IOException {
                return FileVisitResult.CONTINUE;
            }

            @Override
            public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
                // here you have the files to process
                System.out.println(file);
                return FileVisitResult.CONTINUE;
            }

            @Override
            public FileVisitResult visitFileFailed(Path file, IOException exc) throws IOException {
               return FileVisitResult.TERMINATE;
            }

            @Override
            public FileVisitResult postVisitDirectory(Path dir, IOException exc) throws IOException {
              return FileVisitResult.CONTINUE;
            }
        });

0 讨论(0)

無奈伤痛

2020-12-06 00:40
You can do that with Apache FileUtils library. No memory problem. I did check with visualvm.
```
  Iterator<File> it = FileUtils.iterateFiles(folder, null, true);
  while (it.hasNext())
  {
     File fileEntry = (File) it.next();
  }
```
Hope that helps. bye
0 讨论(0)
发布评论:

提交评论
- 加载中...
甜味超标

2020-12-06 00:42

Why do you store 2 million files in the same directory anyway? I can imagine it slows down access terribly on the OS level already.

I would definitely want to have them divided into subdirectories (e.g. by date/time of creation) already before processing. But if it is not possible for some reason, could it be done during processing? E.g. move 1000 files queued for Process1 into Directory1, another 1000 files for Process2 into Directory2 etc. Then each process/thread sees only the (limited number of) files portioned for it.

0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2 3