Process stuck in exit, shows as zombie but cannot be reaped

这一生的挚爱 提交于 2019-11-29 23:44:00

问题


I have a process that's monitored by its parent. The child encountered an error that caused it to call abort. The process does not tamper with the abort process, so it should proceed as expected (dump core, terminate). The parent is supposed to detect the child's termination and trigger a series of events to respond to the failure. The child is multi-threaded and complex.

Here's what I see from ps:

F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
0  1000  4929  1272  20   0  85440  6792 wait   S+   pts/2      0:00 rxd
1  1000  4930  4929  20   0      0     0 exit   Zl+  pts/2     38:21 [rxd] <defunct>

So the child (4930) has terminated. It is a zombie. I cannot attach to it, as expected. However, the parent (4929) stays blocked in:

int i;
// ...
waitpid (-1, &i, 0);

So it seems like the child is a zombie but somehow has not completed everything necessary for its parent to reap it. The WCHAN field of exit is, I think, a valuable clue.

The platform is 64-bit Linux, Ubuntu 13.04, kernel 3.8.0-30. The child doesn't appear to be dumping core or doing anything. I've left the system for several minutes and nothing changed.

Does anyone have any ideas what might be causing this or what I can do about it?

Update: Another interesting bit of information -- if I kill -9 the parent process, the child goes away. This is kind of baffling, since the parent process is trivial, just blocking in waitpid. Also, I don't get any core dump (from the child) when this problem happens.

Update: It seems the child is stuck in schedule, called from exit_mm, called from do_exit. I wonder why exit_mm would call schedule. And I wonder why killing the parent would unstick it.


回答1:


I finally figured it out! The process was actually doing useful work all this time. The process held the last reference to a large file on a slow filesystem. When the process terminates, the last reference to the file is release, forcing the OS to reclaim the space. The file was so large that this required tens of thousands of I/O operations, taking 10 minutes or more.



来源:https://stackoverflow.com/questions/19059615/process-stuck-in-exit-shows-as-zombie-but-cannot-be-reaped

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!