I\'m running CF 9.0.1 on Ubuntu on an \"Medium\" Amazon EC2 instance. CF has been seizing-up intermittently (several times per day...but notably not isolated to hours of pea
I've had a number of 'high-cpu in production' type bugs and the way i've always dealt with them is this:
Use jstack PID >> stack.log to dump 5 of stack traces, 5 seconds apart. Number of traces and timing not critical.
Open the log in Samurai. You get a view of the threads at each time you did a dump. Threads processing your code start web- (for requests using the built-in server) and jrpp- for requests coming in through Apache/IIS.
Read the history of each thread. You're looking for the stack being very similar in each dump. If a thread looks like it's handling the same request the whole time, the bits that vary near the top will point to where an infinite loop is happening.
Feel free to dump a stack trace somewhere online and point us to it.
The other technique I've used to understand what's going on is to modify apache's httpd.conf to log time taken: %D and record session id: %{jsessionid} which allows you to trace individual users in the run-up to hangs and to do some nice stats/graphs with the data (I use LogParser to crunch the numbers and output to CSV, followed by Excel to graph the data):
LogFormat "%h %l %u %t "%r" %>s %b %D %{jsessionid}" customAnalysis
CustomLog logs/analysis_log customAnalysis
One other technique I've just remembered is to enable CF Metrics, which will get you some measure of what the server was up to in the runup to a hang. I set this to log every 10 seconds and change the format to be CSV, so I can grep the metrics from the event log and then run them through Excel to graph server load in the runup to crashes.
Barny