Tomcat shutting down by itself

匿名 (未验证) 提交于 2019-12-03 03:10:03

问题:

I'm running Tomcat 7.0.53 on CentOS 6.5 64 bit and OpenJDK 1.7 64.

I have several servers where occasionally - like randomly once per week - Tomcat will just gracefully shut itself down. I've been looking at this for months and cannot find the reason. The only pattern seems to be a long period of nothing going on then the shutdown.

  • I have upgraded to latest versions of the underlying tools (Tomcat, Java, etc)
  • I am running with the default memory settings
  • I have disabled the SHUTDOWN in the server.xml
  • I have moved MySQL to another box to ensure Tomcat is the only substantial application (server also runs nginx)
  • I have verified that I have no System.exit() in any of my code. I have not scanned the libs as I can't figure out how to do that. And I don't expect I will find anything there either. Why would a library exit anyway?
  • I have enabled logging of garbage collection. But they seem pretty quick - like full GC in 0.15s

Here is the memory usage of one of the servers. Lots of RAM free.

                 total       used       free     shared    buffers     cached     Mem:          2006        771       1234          0        176        281     -/+ buffers/cache:        313       1692     Swap:         2047          0       2047 

Below is one of the events in the Catalina.out. You can see it starts then does nothing exciting for several hours. Then gracefully shuts itself down as if it was told to.

I've researched this to death and have not bee able to get a handle on this.

Can someone please propose a plan of action for me?

Thanks

From the Server.xml:

    <Server port="-1" shutdown="__SHUTDOWN__"> 

From Catalina.out:

Apr 28, 2014 5:34:50 PM org.apache.tomcat.util.digester.SetPropertiesRule begin WARNING: [SetPropertiesRule]{Server/Service/Engine/Host/Valve} Setting property 'remoteIpProxiesHeader' to 'x-forwarded-by' did not find a matching property. Apr 28, 2014 5:34:50 PM org.apache.coyote.AbstractProtocol init INFO: Initializing ProtocolHandler ["http-apr-8080"] Apr 28, 2014 5:34:50 PM org.apache.coyote.AbstractProtocol init INFO: Initializing ProtocolHandler ["ajp-apr-8009"] Apr 28, 2014 5:34:50 PM org.apache.catalina.startup.Catalina load INFO: Initialization processed in 3247 ms Apr 28, 2014 5:34:50 PM org.apache.catalina.core.StandardService startInternal INFO: Starting service Catalina Apr 28, 2014 5:34:50 PM org.apache.catalina.core.StandardEngine startInternal INFO: Starting Servlet Engine: Apache Tomcat/7.0.53 Apr 28, 2014 5:34:59 PM org.apache.coyote.AbstractProtocol start INFO: Starting ProtocolHandler ["http-apr-8080"] Apr 28, 2014 5:34:59 PM org.apache.coyote.AbstractProtocol start INFO: Starting ProtocolHandler ["ajp-apr-8009"] Apr 28, 2014 5:34:59 PM org.apache.catalina.startup.Catalina start INFO: Server startup in 8278 ms Apr 28, 2014 5:41:53 PM org.apache.jasper.compiler.TldLocationsCache tldScanJar INFO: At least one JAR was scanned for TLDs yet contained no TLDs. Enable debug logging for this logger for a complete list of JARs that were scanned but no TLDs were found in them. Skipping unneeded JARs during scanning can improve startup time and JSP compilation time. Apr 28, 2014 10:32:32 PM org.apache.coyote.AbstractProtocol pause INFO: Pausing ProtocolHandler ["http-apr-8080"] Apr 28, 2014 10:32:32 PM org.apache.coyote.AbstractProtocol pause INFO: Pausing ProtocolHandler ["ajp-apr-8009"] Apr 28, 2014 10:32:32 PM org.apache.catalina.core.StandardService stopInternal INFO: Stopping service Catalina Apr 28, 2014 10:32:32 PM org.apache.coyote.AbstractProtocol stop INFO: Stopping ProtocolHandler ["http-apr-8080"] Apr 28, 2014 10:32:32 PM org.apache.coyote.AbstractProtocol stop INFO: Stopping ProtocolHandler ["ajp-apr-8009"] Apr 28, 2014 10:32:32 PM org.apache.coyote.AbstractProtocol destroy INFO: Destroying ProtocolHandler ["http-apr-8080"] Apr 28, 2014 10:32:32 PM org.apache.coyote.AbstractProtocol destroy INFO: Destroying ProtocolHandler ["ajp-apr-8009"] May 05, 2014 8:10:32 PM org.apache.catalina.core.AprLifecycleListener init INFO: Loaded APR based Apache Tomcat Native library 1.1.29 using APR version 1.3.9. May 05, 2014 8:10:32 PM org.apache.catalina.core.AprLifecycleListener init INFO: APR capabilities: IPv6 [true], sendfile [true], accept filters [false], random [true]. May 05, 2014 8:10:32 PM org.apache.tomcat.util.digester.SetPropertiesRule begin 

回答1:

Something is telling Tomcat to shut down.

Tomcat does not gracefully shut down when it's "in trouble" (out of memory, or anything else). Linux does not gracefully shut down processes when it's low on memory, it stops them hard with a kill -9, which leave no trace in the Tomcat log.

  • kill -15 will shut down tomcat gracefully, via the shut down hook. (The answer previously said kill -3 here, which is simply causing a thread dump.)

  • An internal exit() will shut it down gracefully as well, again via the shut down hook.

  • It can be shut down via JMX, and, of course the shut down command from Catalina.

So, someone, somewhere, is telling Tomcat to do this. This isn't some unknown, internal "problem". Those kind of things just kill tomcat outright, they don't ask politely.

One thing you might try is to enable DEBUG for logging, to see what chatter you get. If you get too much, you can try looking specifically at org.apache.catalina.core.StandardContext. This may well not give you more than it already being logged, but it may tell you something.

After that, if so motivated, and it continues, I'd suggest building Tomcat yourself, and interjecting some stack trace dumps.

For example, in org.apache.coyote.AbstractProtcol, you can add:

Exception e = new Exception("Shut down trace"); getLog().info("Shutting down protocol", e); 

to the stop() method.

That will give you a solid stack trace to learn from whence this came. Tomcat isn't really a nasty event storm of just random messages floating around. The shut down is pretty synchronous. So, that stack trace will give you a solid lead as to who started the party. Then you can look from there to see how that could have happened.

There's a bunch of interfaces and abstractions in Tomcat, but really only a few implementations, so it's not impenetrable. And having a smoking stack trace will help immensely in pinning it down.



回答2:

There is a known (and fixed bug) in Tomcat versions prior to v7.0.55:

https://bz.apache.org/bugzilla/show_bug.cgi?id=56684

The thread is really good reading but I can summarize it for you. When you start Tomcat there is a wait state that starts waiting for the shutdown command. This is normal, but the bug in these versions is an implicit timeout on this wait of about 50 days. When the wait state timed out, the Tomcat would shutdown.

So there ARE cases of odd things happening in Tomcat and NO, you are not crazy. This was happening to a SOLR cluster that I maintained for more than a year before I found this bug and fixed it with an upgrade. No one ever noticed because it took so long to crash and with all that goes on it can be easy to brush past finding the root cause. The easiest thing is to just restart Tomcat and forget it ever happened.

I hope this answer helps you. I see that you are/were running a version with this bug.

Good luck!



回答3:

How is your Tomcat started? If there is a parent process, a signal sent to the parent could be the cause of a clean shutdown.



回答4:

A case in DC/OS

Recently we had a similar incident. Our tomcat is running in a docker as a service in a DC/OS environment. The stderr just said tomcat was killed, and there was no useful traces in any log files of tomcat.

It turned out the reason was simply that we did not allocate enough memory to the service. More debugging suggestions are here, but the key technique we used is to check the kernel log:

# journalctl -f _TRANSPORT=kernel ... Mar 29 15:09:09 host-17 kernel: Memory cgroup out of memory: Kill process 7935 (java) score 1021 or sacrifice child Mar 29 15:09:09 host-17 kernel: Killed process 7906 (java) total-vm:11583840kB, anon-rss:1032860kB, file-rss:30924kB, shmem-rss:0kB ... 

Once we found that out, we increased the memory allocation to the service in DC/OS and the problem was solved.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!