epoll_wait() receives socket closed twice (read()/recv() returns 0)

喜你入骨 提交于 2019-12-03 20:19:48

As soon as the first read() returns 0, this means that the connection was closed by the peer. Why does the kernel generate a EPOLLIN event for this case? Well, there's no other way to indicate the socket's closure when you're only subscribed to EPOLLIN. You can add EPOLLRDHUP which is basically the same as checking for read() returning 0. However, make sure to test for this flag before you test for EPOLLIN.

  if (flag & EPOLLRDHUP) {
     /* Connection was closed. */
     deleteConnectionData(...);
     close(fd); /* Will unregister yourself from epoll. */
     return;
  }

  if (flag & EPOLLIN) {
    readData(...);
  }

  if (flag & EPOLLOUT) {
    writeData(...);
  }

The way I've ordered these blocks is relevant and the return for EPOLLRDHUP is important too, because it is likely that deleteConnectionData() may have destroyed internal structures. As EPOLLIN is set as well in case of a closure, this could lead to some problems. Ignoring EPOLLIN is safe because it won't yield any data anyway. Same for EPOLLOUT as it's never sent in conjunction with EPOLLRDHUP!

epoll_wait() is running in it's own thread and queues fd events to be handled elsewhere. ... So why when the server load is high, epoll thinks that my application didn't get the close-event and queues new one?

Assuming that EPOLLONESHOT is bug free (I haven't searched for associated bugs though), the fact that you are processing your epoll events in another thread and that it crashes sporadically or under heavy load may mean that there is a race condition somewhere in your application.

May be the object pointed to by epoll_event.data.ptr gets deallocated prematurely before the epoll event is unregistered in another thread when you server does an active close of the client connection.

My first try would be to run it under valgrind and see if it reports any errors.

I would re-check myself against the following sections from epoll(7):

Q6
Will closing a file descriptor cause it to be removed from all epoll sets automatically?

and

o If using an event cache...

There're some good points there.

Removing EPOLLONESHOT made the problem disappear after few other changes. Unfortunately I'm not totally sure what caused it. Using EPOLLONESHOT with threads and adding the fd again manually into the epoll queue was quite certainly the problem. Also the data pointer in epoll struct is released after a delay. Works perfectly now.

Register Signal 0x2000 for Remote host closed connection ex ev.events = EPOLLIN | EPOLLONESHOT | EPOLLET | 0x2000 and check if (flag & 0x2000) for Remote Host Close Connection

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!