Fast Linux file count for a large number of files

后端 未结 17 2600
名媛妹妹
名媛妹妹 2020-12-22 17:21

I\'m trying to figure out the best way to find the number of files in a particular directory when there are a very large number of files (more than 100,000).

When the

17条回答
  •  南方客
    南方客 (楼主)
    2020-12-22 18:05

    You should use "getdents" in place of ls/find

    Here is one very good article which described the getdents approach.

    http://be-n.com/spw/you-can-list-a-million-files-in-a-directory-but-not-with-ls.html

    Here is the extract:

    ls and practically every other method of listing a directory (including Python's os.listdir and find .) rely on libc readdir(). However, readdir() only reads 32K of directory entries at a time, which means that if you have a lot of files in the same directory (e.g., 500 million directory entries) it is going to take an insanely long time to read all the directory entries, especially on a slow disk. For directories containing a large number of files, you'll need to dig deeper than tools that rely on readdir(). You will need to use the getdents() system call directly, rather than helper methods from the C standard library.

    We can find the C code to list the files using getdents() from here:

    There are two modifications you will need to do in order quickly list all the files in a directory.

    First, increase the buffer size from X to something like 5 megabytes.

    #define BUF_SIZE 1024*1024*5
    

    Then modify the main loop where it prints out the information about each file in the directory to skip entries with inode == 0. I did this by adding

    if (dp->d_ino != 0) printf(...);
    

    In my case I also really only cared about the file names in the directory so I also rewrote the printf() statement to only print the filename.

    if(d->d_ino) printf("%sn ", (char *) d->d_name);
    

    Compile it (it doesn't need any external libraries, so it's super simple to do)

    gcc listdir.c -o listdir
    

    Now just run

    ./listdir [directory with an insane number of files]
    

提交回复
热议问题