Get current number of running containers in Spark on YARN

微笑、不失礼 提交于 2020-01-17 08:01:28

问题


I have a Spark application running on top of yarn. Having an RDD I need to execute a query against the database. The problem is that I have to set proper connection options otherwise the database will be overloaded. And these options depend on the number of workers that query this DB simultaneously. To solve this problem I want to detect the current number of running workers in runtime (from a worker). Something like that:

val totalDesiredQPS = 1000 //queries per second
val queries: RDD[String] = ???
queries.mapPartitions(it => {
      val dbClientForThisWorker = ...
      //TODO: get this information from YARN somehow
      val numberOfContainers = ???
      val dbClientForThisWorker.setQPS(totalDesiredQPS / numberOfContainers)
      it.map(query => dbClientForThisWorker.executeAsync...)
      ....
})

Also I appreciate alternative solutions but I want to avoid shuffle and get almost full db utilization no matter what the number of worker is.

来源:https://stackoverflow.com/questions/46825993/get-current-number-of-running-containers-in-spark-on-yarn

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!