Listing files in a directory with Python when the directory is huge

僤鯓⒐⒋嵵緔 提交于 2019-12-11 03:38:58

问题


I'm trying to deal with many files in Python. I first need to get a list of all the files in a single directory. At the moment, I'm using:

os.listdir(dir)

However. This isn't feasible since the directory I'm searching has upward of 81,000 files in it, and totals almost 5 Gigabytes.

What's the best way of stepping through each file one-by-one? Without Windows deciding that the Python process is not responding and killing it? Because that tends to happen.

It's being run on a 32-bit Windows XP machine, so clearly it can't index more than 4 GB of RAM.

Any other ideas form anyone to solve this problem?


回答1:


You may want to try using the scandir module:

scandir is a module which provides a generator version of os.listdir() that also exposes the extra file information the operating system returns when you iterate a directory. scandir also provides a much faster version of os.walk(), because it can use the extra file information exposed by the scandir() function.

There's an accepted PEP proposing to merge it into the Python standard library, so it seems to have some traction.

Simple usage example from their docs:

def subdirs(path):
    """Yield directory names not starting with '.' under given path."""
    for entry in os.scandir(path):
        if not entry.name.startswith('.') and entry.is_dir():
            yield entry.name



回答2:


You could use glob.iglob to avoid reading the entire list of filenames into memory. This returns a generator object allowing you to step through the filenames in your directory one by one:

import glob

files = glob.iglob(pathname\*)

for f in files:
    # do something with f


来源:https://stackoverflow.com/questions/25550919/listing-files-in-a-directory-with-python-when-the-directory-is-huge

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!