Play framework resource starvation after a few days

柔情痞子 提交于 2020-01-17 05:53:30

问题


I am experiencing an issue in Play 2.5.8 (Java) where database related service endpoints starts timing out after a few days even though the server CPU & memory usage seems fine. Endpoints that does not access the DB continue to work perfectly.

The application runs on a t2.medium EC2 instance with a t2.medium MySQL RDS, both in the same availability zone. Most HTTP calls do lookups/updates to the database with around 8-12 requests per second, and there are also ±800 WebSocket connections/actors with ±8 requests/second (90% of the WebSocket messages does not access the database). DB operations are mostly simple lookups & updates taking around 100ms.

When using only the default thread pool it took about 2 days to reach the deadlock, and after moving the database requests to a separate thread pool as per https://www.playframework.com/documentation/2.5.x/ThreadPools#highly-synchronous, it improved but only to about 4 days.

This is my current thread config in application.conf:

akka {
  actor {
    guardian-supervisor-strategy = "actors.RootSupervisionStrategy"
  }
  loggers = ["akka.event.Logging$DefaultLogger",
    "akka.event.slf4j.Slf4jLogger"]
  loglevel = WARNING

  ## This pool handles all HTTP & WebSocket requests
  default-dispatcher {
      executor = "thread-pool-executor"
      throughput = 1
      thread-pool-executor {
        fixed-pool-size = 64 
      }
  }

  db-dispatcher {
    type = Dispatcher
    executor = "thread-pool-executor"
    throughput = 1
    thread-pool-executor {
      fixed-pool-size = 210 
    }
  }
}

Database configuration:

play.db.pool="default"
play.db.prototype.hikaricp.maximumPoolSize=200
db.default.driver=com.mysql.jdbc.Driver

I have played around with the amount of connections in the DB pool & adjusting the size of the default & db-dispatcher pool size but it doesn't seem to make any difference. It feels I'm missing something fundamental about Play's thread pools & configuration as I don't think the load on the server should not be an issue for Play to handle.


回答1:


After more investigation I found that the issue is not related to thread pool configuration at all, but rather TCP connections that build up due to WS reconnections until the server (or Play framework) cannot accept any more connections. When this happens, only established TCP connections are serviced which mostly includes the established WebSocket connections.

I could not yet determine why the connections are not managed/closed properly.

My issue relates to this question:

Play 2.5 WebSocket Connection Build



来源:https://stackoverflow.com/questions/39930134/play-framework-resource-starvation-after-a-few-days

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!