Spark: get number of cluster cores programmatically

后端 未结 4 1895
庸人自扰
庸人自扰 2020-12-09 05:35

I run my spark application in yarn cluster. In my code I use number available cores of queue for creating partitions on my dataset:

Dataset ds = ...
ds.coale         


        
4条回答
  •  一生所求
    2020-12-09 06:17

    Found this while looking for the answer to pretty much the same question.

    I found that:

    Dataset ds = ...
    ds.coalesce(sc.defaultParallelism());
    

    does exactly what the OP was looking for.

    For example, my 5 node x 8 core cluster returns 40 for the defaultParallelism.

提交回复
热议问题