How to quickly find added / removed files?

前端未结

关注

 10  2610

情深已故 2020-12-25 08:28

I am writing a little program that creates an index of all files on my directories. It basically iterates over each file on the disk and stores it into a searchable database

10条回答

春和景丽 (楼主)

2020-12-25 08:45
Given that we do not want to monitor file system events, could we then just keep track of the (name,size,time,checksum) of each file? The computation of the file checksum (or cryptographic hash, if you prefer) is going to be the bottleneck. You could just compute it once in the initial run, and re-compute it only when necessary subsequently (e.g. when files match on the other three attributes). Of course, we don't need to bother with this if we only want to track filenames and not file content.

You mention that your Java implementation (similar to this) is very slow compared to "dir /s". I think there are two reasons for this:
1. File.listFiles() is inherently slow. See this earlier question "Is there a workaround for Java’s poor performance on walking huge directories?", and this Java RFE "File.list(FilenameFilter) is not effective for huge directories" for more information. This shortcoming is apparently addressed by NIO.2, coming soon.
2. Are you traversing your directories using recursion? If so, try a non-recursive approach, like pushing/popping directories to be visited on/off a stack. My limited personal experience suggests that the improvement can be quite significant.
0 讨论(0)

查看其它10个回答
发布评论:

提交评论
- 加载中...