benchmarks: does python have a faster way of walking a network folder?

后端 未结 2 668
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-12-24 12:24

I need to walk through a folder with approximately ten thousand files. My old vbscript is very slow in handling this. Since I\'ve started using Ruby and Python since then,

2条回答
  •  無奈伤痛
    2020-12-24 12:40

    The Ruby implementation for Dir is in C (the file dir.c, according to this documentation). However, the Python equivalent is implemented in Python.

    It's not surprising that Python is less performant than C, but the approach used in Python gives a little more flexibility - for example, you could skip entire subtrees named e.g. '.svn', '.git', '.hg' while traversing a directory hierarchy.

    Most of the time, the Python implementation is fast enough.

    Update: The skipping of files/subdirs doesn't affect the traversal rate at all, but the overall time taken to process a directory tree could certainly be reduced because you avoid having to traverse potentially large subtrees of the main tree. The time saved is of course proportional to how much you skip. In your case, which looks like folders of images, it's unlikely you would save much time (unless the images were under revision control, when skipping subtrees owned by the revision control system might have some impact).

    Additional update: Skipping folders is done by changing the dirs value in place:

    for root, dirs, files in os.walk(path):
        for skip in ('.hg', '.git', '.svn', '.bzr'):
            if skip in dirs:
                dirs.remove(skip)
            # Now process other stuff at this level, i.e.
            # in directory "root". The skipped folders
            # won't be recursed into.
    

提交回复
热议问题