Is there a way to efficiently yield every file in a directory containing millions of files?

前端 未结 6 1677
萌比男神i
萌比男神i 2020-12-01 18:53

I\'m aware of os.listdir, but as far as I can gather, that gets all the filenames in a directory into memory, and then returns the list. What I want, is a way t

6条回答
  •  难免孤独
    2020-12-01 19:24

    @jsbueno's post is really useful, but is still kind of slow on slow disks since libc readdir() only ready 32K of disk entries at a time. I am not an expert on making system calls directly in python, but I outlined how to write code in C that will list a directory with millions of files, in a blog post at: http://www.olark.com/spw/2011/08/you-can-list-a-directory-with-8-million-files-but-not-with-ls/.

    The ideal case would be to call getdents() directly in python (http://www.kernel.org/doc/man-pages/online/pages/man2/getdents.2.html) so you can specify a read buffer size when loading directory entries from disk.

    Rather than calling readdir() which as far as I can tell has a buffer size defined at compile time.

提交回复
热议问题