Building a directory tree from a list of file paths

扶醉桌前 提交于 2019-12-23 19:53:32

问题


I am looking for a time efficient method to parse a list of files into a tree. There can be hundreds of millions of file paths.

The brute force solution would be to split each path on occurrence of a directory separator, and traverse the tree adding in directory and file entries by doing string comparisons but this would be exceptionally slow.

The input data is usually sorted alphabetically, so the list would be something like:

C:\Users\Aaron\AppData\Amarok\Afile

C:\Users\Aaron\AppData\Amarok\Afile2

C:\Users\Aaron\AppData\Amarok\Afile3

C:\Users\Aaron\AppData\Blender\alibrary.dll

C:\Users\Aaron\AppData\Blender\and_so_on.txt

From this ordering my natural reaction is to partition the directory listings into groups... somehow... before doing the slow string comparisons. I'm really not sure. I would appreciate any ideas.

Edit: It would be better if this tree were lazy loaded from the top down if possible.


回答1:


You have no choice but to do full string comparisons since you can't guarantee where the strings might differ. There are a couple tricks that might speed things up a little:

  • As David said, form a tree, but search for the new insertion point from the previous one (perhaps with the aid of some sort of matchingPrefix routine that will tell you where the new one differs).
  • Use a hash table for each level of the tree if there may be very many files within and you need to count duplicates. (Otherwise, appending to a stack is fine.)



回答2:


if its possible, you can generate your tree structure with the tree command, here




回答3:


To take advantage of the "usually sorted" property of your input data, begin your traversal at the directory where your last file was inserted: compare the directory name of current pathname to the previous one. If they match, you can just insert here, otherwise pop up a level and try again.



来源:https://stackoverflow.com/questions/2445213/building-a-directory-tree-from-a-list-of-file-paths

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!