问题
I am curious how epoll_wait() receives the event that a registered socket (with epoll_ctl()) is ready for read/write.
I believe that glibc magically handles it.
Then, is there a document describing how the following events can be triggered for a socket?
- EPOLLPRI
- EPOLLRDNORM
- EPOLLRDBAND
- EPOLLWRNORM
- EPOLLWRBAND
- EPOLLMSG
- EPOLLERR
- EPOLLHUP
- EPOLLRDHUP
P.S. Originally I was trying to paste the enum EPOLL_EVENTS in sys/epoll.h on my box here; stackoverflow thinks that I don't format the code block correctly although I wrapped it with pre and then code tag, any idea?
回答1:
The most glaring problem with epoll
documentation is its failure to state in "bold caps" that epoll
events, are, in fact, fully identical to poll
(2) events. Indeed, on the kernel side epoll
handles its events in terms of older poll
event names:
#define POLLIN 0x0001 // EPOLLIN
#define POLLPRI 0x0002 // EPOLLPRI
#define POLLOUT 0x0004 // EPOLLOUT
#define POLLERR 0x0008 // EPOLLERR
#define POLLHUP 0x0010 // EPOLLHUP
#define POLLNVAL 0x0020 // unused in epoll
#define POLLRDNORM 0x0040 // EPOLLRDNORM
#define POLLRDBAND 0x0080 // EPOLLRDBAND
#define POLLWRNORM 0x0100 // EPOLLWRNORM
#define POLLWRBAND 0x0200 // EPOLLWRBAND
#define POLLMSG 0x0400 // EPOLLMSG
#define POLLREMOVE 0x1000 // unused in epoll
#define POLLRDHUP 0x2000 // EPOLLRDHUP
Then, a brief inspection of kernel source reveals that:
EPOLLIN
andEPOLLRDNORM
are identical (epoll returnsEPOLLIN | EPOLLRDNORM
when data is available for reading from the file descriptor).EPOLLOUT
andEPOLLWRNORM
are identical (epoll returnsEPOLLOUT | EPOLLWRNORM
when buffer space is available for writing).EPOLLRDBAND
andEPOLLWRBAND
signal availability of the out of band data on the descriptor (on some sockets this will be the data send withMSG_OOB
flag passed to socket).EPOLLPRI
is a modifier flag and always augments some other event (such asEPOLLERR
). It's use is subsystem dependent, as it may mean somewhat different things depending on what purpose associated file descriptor serves.EPOLLMSG
appears to be unused by the kernel and appears to serve no purpose.EPOLLRDHUP
signals that the peer had closed its side of the channel for reading, but may still receive data (handy to establish that no more request data is coming in).EPOLLHUP
signals that the peer had closed its side of the channel.
回答2:
All the critical work for epoll
is done in the kernel, the user space API is just an interface. The previous thread on Why exactly does ePoll scale better than Poll? covers the details of how the kernel implements epoll
is nice details.
As for a document describing the events and how they are triggered the epoll_ctl(2) man page covers each event, for example:
EPOLLIN
The associated file is available for read(2) operations.
EPOLLOUT
The associated file is available for write(2) operations.
For a better description of EPOLLET
you need to read the epoll(7) man page.
This is a complete example of how to use epoll.
You use epoll_ctl
to request which events you wish to receive events EPOLLIN
and EPOLLET
, the code above does this:
event.events = EPOLLIN | EPOLLET;
s = epoll_ctl (efd, EPOLL_CTL_ADD, infd, &event);
来源:https://stackoverflow.com/questions/18103093/how-does-a-socket-event-get-propagated-converted-to-epoll