How to determine if a path is inside a directory? (POSIX)

问题

In C, using POSIX calls, how can I determine if a path is inside a target directory?

For example, a web server has its root directory in /srv, this is getcwd() for the daemon. When parsing a request for /index.html, it returns the contents of /srv/index.html.

How can I filter out requests for paths outside of /srv?

/../etc/passwd, /valid/../../etc/passwd, etc.

Splitting the path at / and rejecting any array containing .. will break valid accesses /srv/valid/../index.html.

Is there a canonical way to do this with system calls? Or do I need to manually walk the path and count directory depth?

回答1:

There's always realpath:

The realpath() function shall derive, from the pathname pointed to by *file_name*, an absolute pathname that resolves to the same directory entry, whose resolution does not involve '.' , '..' , or symbolic links.

Then compare what realpath gives you with your desired root directory and see if they match up.

You could also clean up the filename by hand by expanding the double-dots before you prepend the "/srv". Split the incoming path on slashes and walk through it piece by piece. If you get a "." then remove it and move on; if you get a "..", then remove it and the previous component (taking care not go past the first entry in your list); if you get anything else, just move on to the next component. Then paste what's left back together with slashes between the components and prepend your "/srv/". So if someone gives you "/valid/../../etc/passwd", you'll end up with "/srv/etc/passwd" and "/where/is/../pancakes/house" will end up as "/srv/where/pancakes/house".

That way you can't get outside "/srv" (except through symbolic links of course) and an incoming "/../.." will be the same as "/" (just like in a normal file system). But you'd still want to use realpath if you're worried about symbolic under "/srv".

Working with the path name component by component would also allow you to break the connection between the layout you present to the outside world and the actual file system layout; there's no need for "/this/that/other/thing" to map to an actual "/srv/this/that/other/thing" file anywhere, the path could just be a key in some sort of database or some sort of namespace path to a function call.

回答2:

To determine if a file F is within a directory D, first stat D to determine its device number and inode number (members st_dev and st_ino of struct stat).

Then stat F to determine if it is a directory. If not, call basename to determine the name of the directory containing it. Set G to the name of this directory. If F was already a directory, set G=F.

Now, F is within D if and only if G is within D. Next we have a loop.

while (1) {
  if (samefile(d_statinfo.d_dev, d_statinfo.d_ino, G)) {
    return 1; // F was within D
  } else if (0 == strcmp("/", G) {
    return 0; // F was not within D.
  }
  G = dirname(G);
}

The samefile function is simple:

int samefile(dev_t ddev, ino_t dino, const char *path) {
  struct stat st;
  if (0 == stat(path, &st)) {
    return ddev == st.st_dev && dino == st.st_no;
  } else {
    throw ...; // or return error value (but also change the caller to detect it)
  }
}

This will work on POSIX filesystems. But many filesystems are not POSIX. Problems to look out for include:

Filesystems where the device/inode are not unique. Some FUSE filesystems are examples of this; they sometimes make up inode numbers when the underlying filesystems don't have them. They shouldn't re-use inode numbers, but some FUSE filesystems have bugs.
Broken NFS implementations. On some systems all NFS filesystems have the same device number. If they pass through the inode number as it exists on the server, this could cause a problem (though I've never seen it happen in practice).
Linux bind mount points. If /a is a bind mount of /b, then /a/1 correctly appears to be inside /a, but with the implementation above, /b/1 also appears to be inside /a. I think that's probably the correct answer. However, if this is not the result you prefer, this is easily fixed by changing the return 1 case to call strcmp() to compare the path names too. However, for this to work you will need to start by calling realpath on both F and D. The realpath call can be quite expensive (since it may need to hit the disk a number of times).
The special path //foo/bar. POSIX allows path names beginning with // to be special in a way which is somewhat not well defined. Actually I forget the precise level of guarantee about semantics that POSIX provides. I think that POSIX allows //foo/bar and //baz/ugh to refer to the same file. The device/inode check should still do the right thing for you but you may find it does not (i.e. you may find that //foo/bar and //baz/ugh can refer to the same file but have different device/inode numbers).

This answer assumes that we start with an absolute path for both F and D. If this is not guaranteed you may need to do some conversion using realpath() and getcwd(). This will be a problem if the name of the current directory is longer than PATH_MAX (which can certainly happen).

回答3:

You should simply process .. yourself and remove the previous path component when it's found, so that there are no occurrences of .. in the final string you use for opening files.

来源：https://stackoverflow.com/questions/7134667/how-to-determine-if-a-path-is-inside-a-directory-posix

标签

posix