I am writing a little program that creates an index of all files on my directories. It basically iterates over each file on the disk and stores it into a searchable database
Given that we do not want to monitor file system events, could we then just keep track of the (name,size,time,checksum)
of each file? The computation of the file checksum (or cryptographic hash, if you prefer) is going to be the bottleneck. You could just compute it once in the initial run, and re-compute it only when necessary subsequently (e.g. when files match on the other three attributes). Of course, we don't need to bother with this if we only want to track filenames and not file content.
You mention that your Java implementation (similar to this) is very slow compared to "dir /s
". I think there are two reasons for this:
File.listFiles() is inherently slow. See this earlier question "Is there a workaround for Java’s poor performance on walking huge directories?", and this Java RFE "File.list(FilenameFilter) is not effective for huge directories" for more information. This shortcoming is apparently addressed by NIO.2, coming soon.
Are you traversing your directories using recursion? If so, try a non-recursive approach, like pushing/popping directories to be visited on/off a stack. My limited personal experience suggests that the improvement can be quite significant.