python SocketServer stuck on waitpid() syscall

爷,独闯天下 提交于 2019-12-08 13:34:58

问题


I am using Python (2.7) SocketServer with ForkingMixIn. It worked well.

However sometimes on heavy usage (tons of rapidly connecting/disconnecting clients) the "server" stuck, consuming all the idle CPU (shown 100% CPU by top). If I use strace from CLI on the process it shows it does endless sequence of waitpid() syscall. According to command "ps" there are no child processes though at this point.

After this problem my server implementation goes unusable and only its restarting helps :( Clients can connect but no anwser, I guess just the "backlog" queue is used on OS side, but the python code never accepts the connection.

It can be easily reproduced eg with some privimitive HTTP implementation, and a browser (I used chrome) with CTRL-R (reload) hold down for something like 10 seconds. Of course the problem is triggered without this "brutal" try as well "on normal usage" just more rarely, and it was quite hard to even come with the idea what can be the problem. I wrote my own implementation of something like SocketServer with os.fork(), and socket functions, and it does not have this problem, but I am more happy with some "already ready", and "standard" solution.

The problem: it is not a nice thing, as my script implementing a server can be DoS'ed very easily in this way.

What I could notice: I installed a singal handler for SIGCHLD. It seems if I remove that, I can't reproduce the problem, however then I can see zombie processes (I guess since they are not wait()'ed). Even if I install signal handler with signal.SIG_IGN, I expereince this problem.

Can anybody help what can be the problem and how I can solve this? I'd like use singal handler anyway since it's also not so nice to leave many zombie processes, especially after a long run.

Thanks for any idea.


回答1:


maybe related: What is the cost of many TIME_WAIT on the server side?

it is possible that you have all your max connections in a time_wait state.

  • check sysctl net.core.somaxconn for maximum connections.
  • check sysctl net.ipv4 for other configuration details (e.g. tw
  • check ulimit -n for max open file descriptors (sockets included)
  • you can try: sysctl net.ipv4.tcp_tw_reuse=1 to quickly reuse those sockets (don't keep it enabled unless you know what you're doing.)
  • check for file handle leaks.

[not-so] stupid question: how is your SocketServer implementation different from the standard one + ForkingMixIn?

However, it is really easy to abuse a ForkingMixIn (fork bomb), you might want to use green threads, e.g. the eventlet library ( http://eventlet.net/doc/index.html )

this might be your problem.

  • this: http://bugs.python.org/issue7978
  • this: http://mail.python.org/pipermail/python-bugs-list/2010-April/095492.html
  • this: http://twistedmatrix.com/trac/ticket/733

    you will see that SIGCHLD handler is discouraged unless you take some extra measures (signal.siginterrupt(signal.SIGCHLD, False) in handler, or using a wake-up fd in select() call)



来源:https://stackoverflow.com/questions/12833645/python-socketserver-stuck-on-waitpid-syscall

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!