How to list first level directories only in C?

后端 未结 5 1027
清歌不尽
清歌不尽 2020-12-19 21:49

In a terminal I can call ls -d */. Now I want a c program to do that for me, like this:

#include 
#include          


        
相关标签:
5条回答
  • 2020-12-19 22:24

    Just call system. Globs on Unixes are expanded by the shell. system will give you a shell.

    You can avoid the whole fork-exec thing by doing the glob(3) yourself:

    int ec;
    glob_t gbuf;
    if(0==(ec=glob("*/", 0, NULL, &gbuf))){
        char **p = gbuf.gl_pathv;
        if(p){
            while(*p)
                printf("%s\n", *p++);
        }
    }else{
       /*handle glob error*/ 
    }
    

    You could pass the results to a spawned ls, but there's hardly a point in doing that.

    (If you do want to do fork and exec, you should start with a template that does proper error checking -- each of those calls may fail.)

    0 讨论(0)
  • 2020-12-19 22:35

    Another less low-level approach, with system():

    #include <stdlib.h>
    
    int main(void)
    {
        system("/bin/ls -d */");
        return 0;
    }
    

    Notice with system(), you don't need to fork(). However, I recall that we should avoid using system() when possible!


    As Nomimal Animal said, this will fail when the number of subdirectories is too big! See his answer for more...

    0 讨论(0)
  • 2020-12-19 22:36

    Unfortunately, all solutions based on shell expansion are limited by the maximum command line length. Which varies (run true | xargs --show-limits to find out); on my system, it is about two megabytes. Yes, many will argue that it suffices -- as did Bill Gates on 640 kilobytes, once.

    (When running certain parallel simulations on non-shared filesystems, I do occasionally have tens of thousands of files in the same directory, during the collection phase. Yes, I could do that differently, but that happens to be the easiest and most robust way to collect the data. Very few POSIX utilities are actually silly enough to assume "X is sufficient for everybody".)

    Fortunately, there are several solutions. One is to use find instead:

    system("/usr/bin/find . -mindepth 1 -maxdepth 1 -type d");
    

    You can also format the output as you wish, not depending on locale:

    system("/usr/bin/find . -mindepth 1 -maxdepth 1 -type d -printf '%p\n'");
    

    If you want to sort the output, use \0 as the separator (since filenames are allowed to contain newlines), and -t= for sort to use \0 as the separator, too. tr will convert them to newlines for you:

    system("/usr/bin/find . -mindepth 1 -maxdepth 1 -type d -printf '%p\0' | sort -t= | tr -s '\0' '\n'");
    

    If you want the names in an array, use glob() function instead.

    Finally, as I like to harp every now and then, one can use the POSIX nftw() function to implement this internally:

    #define _GNU_SOURCE
    #include <stdio.h>
    #include <ftw.h>
    
    #define NUM_FDS 17
    
    int myfunc(const char *path,
               const struct stat *fileinfo,
               int typeflag,
               struct FTW *ftwinfo)
    {
        const char *file = path + ftwinfo->base;
        const int depth = ftwinfo->level;
    
        /* We are only interested in first-level directories.
           Note that depth==0 is the directory itself specified as a parameter.
        */
        if (depth != 1 || (typeflag != FTW_D && typeflag != FTW_DNR))
            return 0;
    
        /* Don't list names starting with a . */
        if (file[0] != '.')
            printf("%s/\n", path);
    
        /* Do not recurse. */
        return FTW_SKIP_SUBTREE;
    }
    

    and the nftw() call to use the above is obviously something like

    if (nftw(".", myfunc, NUM_FDS, FTW_ACTIONRETVAL)) {
        /* An error occurred. */
    }
    

    The only "issue" in using nftw() is to choose a good number of file descriptors the function may use (NUM_FDS). POSIX says a process must always be able to have at least 20 open file descriptors. If we subtract the standard ones (input, output, and error), that leaves 17. The above is unlikely to use more than 3, though.

    You can find the actual limit using sysconf(_SC_OPEN_MAX), and subtracting the number of descriptors your process may use at the same time. In current Linux systems, it is typically limited to 1024 per process.

    The good thing is, as long as that number is at least 4 or 5 or so, it only affects the performance: it just determines how deep nftw() can go in the directory tree structure, before it has to use workarounds.

    If you want to create a test directory with lots of subdirectories, use something like the following Bash:

    mkdir lots-of-subdirs
    cd lots-of-subdirs
    for ((i=0; i<100000; i++)); do mkdir directory-$i-has-a-long-name-since-command-line-length-is-limited ; done
    

    On my system, running

    ls -d */
    

    in that directory yields bash: /bin/ls: Argument list too long error, while the find command and the nftw() based program all run just fine.

    You also cannot remove the directories using rmdir directory-*/ for the same reason. Use

    find . -name 'directory-*' -type d -print0 | xargs -r0 rmdir
    

    instead. Or just remove the entire directory and subdirectories,

    cd ..
    rm -rf lots-of-subdirs
    
    0 讨论(0)
  • 2020-12-19 22:36

    The lowest-level way to do this is with the same Linux system calls ls uses.

    So look at the output of strace -efile,getdents ls:

    execve("/bin/ls", ["ls"], [/* 72 vars */]) = 0
    ...
    openat(AT_FDCWD, ".", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
    getdents(3, /* 23 entries */, 32768)    = 840
    getdents(3, /* 0 entries */, 32768)     = 0
    ...
    

    getdents is a Linux-specific system call. The man page says that it's used under the hood by libc's readdir(3) POSIX API function.


    The lowest-level portable way (portable to POSIX systems), is to use the libc functions to open a directory and read the entries. POSIX doesn't specify the exact system call interface, unlike for non-directory files.

    These functions:

    DIR *opendir(const char *name);
    struct dirent *readdir(DIR *dirp);
    

    can be used like this:

    // print all directories, and symlinks to directories, in the CWD.
    // like sh -c 'ls -1UF -d */'  (single-column output, no sorting, append a / to dir names)
    // tested and works on Linux, with / without working d_type
    
    #define _GNU_SOURCE    // includes _BSD_SOURCE for DT_UNKNOWN etc.
    #include <dirent.h>
    #include <stdint.h>
    
    #include <sys/types.h>
    #include <sys/stat.h>
    #include <stdio.h>
    #include <stdlib.h>
    
    int main() {
        DIR *dirhandle = opendir(".");     // POSIX doesn't require this to be a plain file descriptor.  Linux uses open(".", O_DIRECTORY); to implement this
        //^Todo: error check
        struct dirent *de;
        while(de = readdir(dirhandle)) { // NULL means end of directory
            _Bool is_dir;
        #ifdef _DIRENT_HAVE_D_TYPE
            if (de->d_type != DT_UNKNOWN && de->d_type != DT_LNK) {
               // don't have to stat if we have d_type info, unless it's a symlink (since we stat, not lstat)
               is_dir = (de->d_type == DT_DIR);
            } else
        #endif
            {  // the only method if d_type isn't available,
               // otherwise this is a fallback for FSes where the kernel leaves it DT_UNKNOWN.
               struct stat stbuf;
               // stat follows symlinks, lstat doesn't.
               stat(de->d_name, &stbuf);              // TODO: error check
               is_dir = S_ISDIR(stbuf.st_mode);
            }
    
            if (is_dir) {
               printf("%s/\n", de->d_name);
            }
        }
    }
    

    There's also a fully compilable example of reading directory entries and printing file info in the Linux stat(3posix) man page. (not the Linux stat(2) man page; it has a different example).


    The man page for readdir(3) says the Linux declaration of struct dirent is:

       struct dirent {
           ino_t          d_ino;       /* inode number */
           off_t          d_off;       /* not an offset; see NOTES */
           unsigned short d_reclen;    /* length of this record */
           unsigned char  d_type;      /* type of file; not supported
                                          by all filesystem types */
           char           d_name[256]; /* filename */
       };
    

    d_type is either DT_UNKNOWN, in which case you need to stat to learn anything about whether the directory entry is itself a directory. Or it can be DT_DIR or something else, in which case you can be sure it is or isn't a directory without having to stat it.

    Some filesystems, like EXT4 I think, and very recent XFS (with the new metadata version), keep type info in the directory, so it can be returned without having to load the inode from disk. This is a huge speedup for find -name: it doesn't have to stat anything to recurse through subdirs. But for filesystems that don't do this, d_type will always be DT_UNKNOWN, because filling it in would require reading all the inodes (which might not even be loaded from disk).

    Sometimes you're just matching on filenames, and don't need type info, so it would be bad if the kernel spent a lot of extra CPU time (or especially I/O time) filling in d_type when it's not cheap. d_type is just a performance shortcut; you always need a fallback (except maybe when writing for an embedded system where you know what FS you're using and that it always fills in d_type, and that you have some way to detect the breakage when someone in the future tries to use this code on another FS type.)

    0 讨论(0)
  • 2020-12-19 22:38

    If you are looking for a simple way to get a list of folders into your program, I'd rather suggest the spawnless way, not calling an external program, and use the standard POSIX opendir/readdir functions.

    It's almost as short as your program, but has several additional advantages:

    • you get to pick folders and files at will by checking the d_type
    • you can elect to early discard system entries and (semi)hidden entries by testing the first character of the name for a .
    • you can immediately print out the result, or store it in memory for later use
    • you can do additional operations on the list in memory, such as sorting and removing other entries that don't need to be included.

    #include <stdio.h>
    #include <sys/types.h>
    #include <sys/dir.h>
    
    int main( void )
    {
        DIR *dirp;
        struct dirent *dp;
    
        dirp = opendir(".");
        while ((dp = readdir(dirp)) != NULL)
        {
            if (dp->d_type & DT_DIR)
            {
                /* exclude common system entries and (semi)hidden names */
                if (dp->d_name[0] != '.')
                    printf ("%s\n", dp->d_name);
            }
        }
        closedir(dirp);
    
        return 0;
    }
    
    0 讨论(0)
提交回复
热议问题