Fast Linux file count for a large number of files

后端 未结 17 2599
名媛妹妹
名媛妹妹 2020-12-22 17:21

I\'m trying to figure out the best way to find the number of files in a particular directory when there are a very large number of files (more than 100,000).

When the

17条回答
  •  北海茫月
    2020-12-22 17:56

    I came here when trying to count the files in a data set of approximately 10,000 folders with approximately 10,000 files each. The problem with many of the approaches is that they implicitly stat 100 million files, which takes ages.

    I took the liberty to extend the approach by Christopher Schultz so it supports passing directories via arguments (his recursive approach uses stat as well).

    Put the following into file dircnt_args.c:

    #include 
    #include 
    
    int main(int argc, char *argv[]) {
        DIR *dir;
        struct dirent *ent;
        long count;
        long countsum = 0;
        int i;
    
        for(i=1; i < argc; i++) {
            dir = opendir(argv[i]);
            count = 0;
            while((ent = readdir(dir)))
                ++count;
    
            closedir(dir);
    
            printf("%s contains %ld files\n", argv[i], count);
            countsum += count;
        }
        printf("sum: %ld\n", countsum);
    
        return 0;
    }
    

    After a gcc -o dircnt_args dircnt_args.c you can invoke it like this:

    dircnt_args /your/directory/*
    

    On 100 million files in 10,000 folders, the above completes quite quickly (approximately 5 minutes for the first run, and followup on cache: approximately 23 seconds).

    The only other approach that finished in less than an hour was ls with about 1 min on cache: ls -f /your/directory/* | wc -l. The count is off by a couple of newlines per directory though...

    Other than expected, none of my attempts with find returned within an hour :-/

提交回复
热议问题