Linux, waitpid, WNOHANG, child process, zombie

人走茶凉 提交于 2019-11-30 19:31:51

问题


I running my program as daemon.

Father process only wait for child process, when it is dead unexpected, fork and wait again.

for (; 1;) {
  if (fork() == 0) break;
  int sig = 0;
  for (; 1; usleep(10000)) {
    pid_t wpid = waitpid(g->pid[1], &sig, WNOHANG);
    if (wpid > 0) break;
    if (wpid < 0) print("wait error: %s\n", strerror(errno));
  }
}

But when child process being killed with -9 signal, the child process goes to zombie process.

waitpid should return the pid of child process immediately!
But waitpid got the pid number after about 90 seconds,

cube     28139  0.0  0.0  70576   900 ?        Ss   04:24   0:07 ./daemon -d
cube     28140  9.3  0.0      0     0 ?        Zl   04:24 106:19 [daemon] <defunct>

Here is the strace of the father

The father does not get stuck, wait4 was called always.

strace -p 28139
Process 28139 attached - interrupt to quit
restart_syscall(<... resuming interrupted call ...>) = 0
wait4(28140, 0x7fff08a2681c, WNOHANG, NULL) = 0
nanosleep({0, 10000000}, NULL)          = 0
wait4(28140, 0x7fff08a2681c, WNOHANG, NULL) = 0

About 90 seconds later father got the SIGCHILD and wait4 returned the pid of the dead child.

--- SIGCHLD (Child exited) @ 0 (0) ---
restart_syscall(<... resuming interrupted call ...>) = 0
wait4(28140, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL}], WNOHANG, NULL) = 28140

Why the child process does not exit immediately? On the contrary, it turns into zombie unexpectedly.


回答1:


I finally find out there were some fd leaks during deep tracing by lsof.

After fd leaks were fixed, the problem was gone.




回答2:


It looks to me like waitpid is not returning the child pid immediately simply because that process is not available.

Furthermore, it looks like you actually want your code to do this because you specify waitpid() with the NOHANG option, which, prevents blocking, essentially allowing the parent to move on if the child pid is not available.

Maybe your process using something you didn't expect? Can you trace its activity to see if you find the bottleneck?

Here is a pretty useful link that might help you: http://infohost.nmt.edu/~eweiss/222_book/222_book/0201433079/ch08lev1sec6.html




回答3:


You could simply use

  for (;;) {
    pid_t wpid = waitpid(-1, &sig, 0);
    if (wpid > 0) break;
    if (wpid < 0) print("wait error: %s\n", strerror(errno));
  }

instead of sleep for a while and try again.



来源:https://stackoverflow.com/questions/22733364/linux-waitpid-wnohang-child-process-zombie

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!