How to load a file in each executor once?

前端 未结 1 760
太阳男子
太阳男子 2020-12-06 23:48

I define the following code in order to load a pretrained embedding model:

import gensim

from gensim.models.fasttext import FastText as FT_gensim
import nu         


        
相关标签:
1条回答
  • 2020-12-07 00:42

    By default, the number of partitions is set to the total number of cores on all the executer nodes in the Spark cluster. Suppose you are processing 10 GB on a Spark cluster (or supercomputing executor) that contains a total of 200 CPU cores, that means Spark might use 200 partitions, by default, to process your data.

    Also, to make all your CPU cores work per each executer this can be solved in python (using 100% of all cores with the multiprocessing module).

    0 讨论(0)
提交回复
热议问题