问题
Few days back, I faced issue where client was receiving response from play application after 20 seconds. I have new relic set on production server which keeps telling about RPM, average response time, CPU and memory usage, etc. As per new relic response time was not exceeding 500 milli-seconds, but I verified that client was receiving response after 20 seconds. To dig out more I added logs in that tells about time required to serve request in play application. I added logs Filter as per following:
val noCache = Filter { (next, rh) =>
val startTime = System.currentTimeMillis
next(rh).map { result =>
val requestTime = System.currentTimeMillis - startTime
Logger.warn(s"${rh.method} ${rh.uri} took ${requestTime}ms and returned ${result.header.status}")
result.withHeaders(
PRAGMA -> "no-cache",
CACHE_CONTROL -> "no-cache, no-store, must-revalidate, max-age=0",
EXPIRES -> serverTime
)
}
}
private def serverTime = {
val calendar = Calendar.getInstance()
val dateFormat = new SimpleDateFormat(
"EEE, dd MMM yyyy HH:mm:ss z")
dateFormat.setTimeZone(calendar.getTimeZone)
dateFormat.format(calendar.getTime())
}
During my load test, I sent around 3K concurrent requests to play-app and captured TCPDUMP for all requests. Following are my observations:
- As per play-application-log, max time play app took to response was 68 milli seconds.
- As per TCPDUMP max time required to response any request was around 10 seconds.
- As per new relic max response time was around 84 milli-seconds(as this is very close to logs I added, we can ignore this one)
As far as I know Filter is one of the last stage in request-response life cycle. So if logs in Filter says that request needed 68 milli-seconds and TCPDUMP claims that response was sent after 10 seconds then what caused delay in responding the request?
I understand that in multi-threading environment there is possibility of context switch after particular statement execution. But context switch should not cause this much delay. As per new relic there were less than 50 threads during this load test.
Can someone explain what can cause this? You are welcome to provide deep insights in request-response life cycle.
回答1:
I was able to fix above issue by increasing FD limit. FD was causing late response.
来源:https://stackoverflow.com/questions/41330430/request-response-life-cycle-in-playscala-2-4-x