The select() and pselect() system calls modify their arguments (the \'fd_set *
\' arguments), so the input value tells the system which file descriptors to check
Since struct fd_set
is just a regular C structure, that should always be fine. I personally don't like doing structure copying via the =
operator, since I've worked on plenty of platforms that didn't have access to the normal set of compiler intrinsics. Using memcpy()
explicitly rather than having the compiler insert a function call is a better way to go, in my book.
From the C spec, section 6.5.16.1 Simple assignment (edited here for brevity):
One of the following shall hold:
...
- the left operand has a qualified or unqualified version of a structure or union type compatible with the type of the right;
...
In simple assignment (=), the value of the right operand is converted to the type of the assignment expression and replaces the value stored in the object designated by the left operand.
If the value being stored in an object is read from another object that overlaps in any way the storage of the first object, then the overlap shall be exact and the two objects shall have qualified or unqualified versions of a compatible type; otherwise, the behavior is undefined.
So there you go, as long as struct fd_set
is a actually a regular C struct
, you're guaranteed success. It does depend, however, on your compiler emitting some kind of code to do it, or relying on whatever memcpy()
intrinsic it uses for structure assignment. If your platform can't link against the compiler's intrinsic libraries for some reason, it may not work.
You will have to play some tricks if you have more open file descriptors than will fit into struct fd_set
. The linux man page says:
An
fd_set
is a fixed size buffer. ExecutingFD_CLR()
orFD_SET()
with a value offd
that is negative or is equal to or larger thanFD_SETSIZE
will result in undefined behavior. Moreover, POSIX requiresfd
to be a valid file descriptor.
As mentioned below, it might not be worth the effort to prove that your code is safe on all systems. FD_COPY()
is provided for just such a use, and is, presumably, always guaranteed:
FD_COPY(&fdset_orig, &fdset_copy)
replaces an already allocated&fdset_copy
file descriptor set with a copy of&fdset_orig
.
You are correct that POSIX doesn't guarantee that copying a fd_set
has to "work". I'm not personally aware of anywhere that it doesn't, but then I've never done the experiment.
You can use the poll()
alternative (which is also POSIX). It works in a very similar way to select()
, except that the input/output parameter is not opaque (and contains no pointers, so a bare memcpy
will work), and its design also entirely removes the need to make a copy of the "requested file descriptors" structure (because the "requested events" and "returned events" are stored in different fields).
You are also correct to surmise that select()
(and poll()
) don't scale particularly well to large numbers of file descriptors - this is because every time the function returns, you must loop through every file descriptor to test if there was activity on it. The solutions to this are various non-standard interfaces (eg. Linux's epoll()
, FreeBSD's kqueue
), which you may need to look into if you find you are having latency problems.
I've done a little research on MacOS X, Linux, AIX, Solaris and HP-UX, and there are some interesting results. I used the following program:
#if __STDC_VERSION__ >= 199901L
#define _XOPEN_SOURCE 600
#else
#define _XOPEN_SOURCE 500
#endif /* __STDC_VERSION__ */
#ifdef SET_FD_SETSIZE
#define FD_SETSIZE SET_FD_SETSIZE
#endif
#ifdef USE_SYS_TIME_H
#include <sys/time.h>
#else
#include <sys/select.h>
#endif /* USE_SYS_TIME_H */
#include <stdio.h>
int main(void)
{
printf("FD_SETSIZE = %d; sizeof(fd_set) = %d\n", (int)FD_SETSIZE, (int)sizeof(fd_set));
return 0;
}
It was compiled twice on each platform:
cc -o select select.c
cc -o select -DSET_FD_SETSIZE=16384
(And on one platform, HP-UX 11.11, I had to add -DUSE_SYS_TIME_H to get things to compile at all.) I separately did a visual check on FD_COPY - only MacOS X seemed to include it, and that had to be activated by ensuring that _POSIX_C_SOURCE
was not defined or by defining _DARWIN_C_SOURCE
.
<sys/select.h>
header - use <sys/time.h>
instead<sys/select.h>
_DARWIN_C_SOURCE
is specifiedClearly, a trivial modification to the program allows automatic checking of FD_COPY:
#ifdef FD_COPY
printf("FD_COPY is a macro\n");
#endif
What is not necessarily trivial is finding out how to ensure that it is available; you end up doing the manual scan and working out how to trigger it.
On all these machines, it looks like an fd_set
can be copied by a structure copy without running into risk of undefined behaviour.
First of all, there is no struct fd_set
. It's simply called fd_set
. However, POSIX does require it to be a struct type, so copying is well-defined.
Secondly, there is no way under standard C in which the fd_set
object could contain dynamically allocated memory, since there is no requirement to use any function/macro to free it before returning. Even if the compiler has alloca
(a pre-vla extension for stack-based allocation), fd_set
could not use memory allocated on the stack, because a program might pass a pointer to the fd_set
to another function which uses FD_SET
, etc., and the allocated memory would cease to be valid as soon as it returns to the caller. Only if the C compiler offered some extension for destructors could fd_set
use dynamic allocation.
In conclusion, it seems to be safe just to assign/memcpy
fd_set
objects, but to be sure, I would do something like:
#ifndef FD_COPY
#define FD_COPY(dest,src) memcpy((dest),(src),sizeof *(dest))
#endif
or alternatively just:
#ifndef FD_COPY
#define FD_COPY(dest,src) (*(dest)=*(src))
#endif
Then you'll use the system's provided FD_COPY
macro if it exists, and only fall back to the theoretically-potentially-unsafe version if it's missing.
I don't have enough rep to add this as a comment to caf's answer, but there are libraries to abstract over the non-standard interfaces like epoll()
and kqueue
. libevent is one, and libev another. I think GLib also has one that ties into its mainloop.