Docker container with status “Dead” after consul healthcheck runs

血红的双手。 提交于 2019-12-20 12:05:34

问题


I am using consul's healthcheck feature, and I keep getting these these "dead" containers:

CONTAINER ID  IMAGE                   COMMAND              CREATED         STATUS              PORTS                                                                                                                                                                    NAMES
20fd397ba638  progrium/consul:latest  "\"/bin/bash -c 'cur 15 minutes ago  Dead

What is exactly a "Dead" container? When does a stopped container become "Dead"?

For the record, I run progrium/consul + gliderlabs/registrator images + SERVICE_XXXX_CHECK env variables to do health checking. It runs a healthcheck script running an image every X secs, something like docker run --rm my/img healthcheck.sh

I'm interested in general to what "dead" means and how to prevent it from happening. Another peculiar thing is that my dead containers have no name.

this is some info from the container inspection:

  "State": {
        "Dead": true,
        "Error": "",
        "ExitCode": 1,
        "FinishedAt": "2015-05-30T19:00:01.814291614Z",
        "OOMKilled": false,
        "Paused": false,
        "Pid": 0,
        "Restarting": false,
        "Running": false,
        "StartedAt": "2015-05-30T18:59:51.739464262Z"
    },

The strange thing is that only every now and then a container becomes dead and isn't removed.

Thank you

Edit: Looking at the logs, I found what makes the container stop fail:

  Handler for DELETE /containers/{name:.*} returned error: Cannot destroy container 003876e41429013e46187ebcf6acce1486bc5011435c610bd163b159ba550fbc: 
Driver aufs failed to remove root filesystem 003876e41429013e46187ebcf6acce1486bc5011435c610bd163b159ba550fbc: 
rename /var/lib/docker/aufs/diff/003876e41429013e46187ebcf6acce1486bc5011435c610bd163b159ba550fbc 
/var/lib/docker/aufs/ diff/003876e41429013e46187ebcf6acce1486bc5011435c610bd163b159ba550fbc-removing: 
device or resource busy

Why does this happen?

edit2: found this: https://github.com/docker/docker/issues/9665


回答1:


Update March 2016: issue 9665 has just been closed by PR 21107 (for docker 1.11 possibly)
That should help avoid the "Driver aufs failed to remove root filesystem", "device or resource busy" problem.


Original answer May 2015

Dead is one if the container states, which is tested by Container.Start()

if container.removalInProgress || container.Dead {
        return fmt.Errorf("Container is marked for removal and cannot be started.")
}

It is set Dead when stopping fails, in order to prevent that container to be restarting.

Amongst the possible cause of failure, see container.Kill().
It means kill -15 and kill -9 are both failing.

// 1. Send a SIGTERM
if err := container.killPossiblyDeadProcess(15); err != nil {
    logrus.Infof("Failed to send SIGTERM to the process, force killing")
    if err := container.killPossiblyDeadProcess(9); err != nil {

That usually mean, as the OP mention, a busy device or resource, preventing the process to be killed.




回答2:


There are a lot of bugs caused by EBUSY, in particular when devicemapper is used.

There is a tracker bug for all of the EBUSY related issues. see https://github.com/docker/docker/issues/5684#issuecomment-69052334



来源:https://stackoverflow.com/questions/30550472/docker-container-with-status-dead-after-consul-healthcheck-runs

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!