Construct a tree from list os file paths (Python) - Performance dependent

后端 未结 3 742
盖世英雄少女心
盖世英雄少女心 2020-12-08 22:49

Hey I am working on a very high performance file-managing/analyzing toolkit written in python. I want to create a function that gives me a list or something like that in a

3条回答
  •  我在风中等你
    2020-12-08 23:16

    First off, "very hight performance" and "Python" don't mix well. If what you are looking for is optimising performance to the extreme, switching to C will bring you benefits far superior to any smart code optimisation that you might think of.

    Secondly, it's hard to believe that the bottleneck in a "file-managing/analyzing toolkit" will be this function. I/O operations on disk are at least a few order of magnitude slower than anything happening in memory. Profiling your code is the only accurate way to gauge this but... I'm ready to pay you a pizza if I'm wrong! ;)

    I built a silly test function just to perform some preliminary measurement:

    from timeit import Timer as T
    
    PLIST = [['dir', ['file', ['dir2', ['file2']], 'file3']], ['dir3', ['file4', 'file5', 'file6', 'file7']]]
    
    def tree(plist, indent=0):
        level = []
        for el in plist:
            if isinstance(el, list):
                level.extend(tree(el, indent + 2))
            else:
                level.append(' ' * indent + el)
        return level
    
    print T(lambda : tree(PLIST)).repeat(number=100000)
    

    This outputs:

    [1.0135619640350342, 1.0107290744781494, 1.0090651512145996]
    

    Since the test path list is 10 files, and the number of iterations is 100000 this means that in 1 second you can process a tree of about 1 million files. Now... unless you are working at Google, that seems an acceptable result to me.

    By contrast, when I started writing this answer, I clicked on the "property" option on the root of my main 80Gb HD [this should be giving me the number of files on it, using C code]. A few minutes are gone, and I'm at around 50 GB, 300000 files...

    HTH! :)

提交回复
热议问题