I define the following code in order to load a pretrained embedding model:
import gensim
from gensim.models.fasttext import FastText as FT_gensim
import nu
By default, the number of partitions is set to the total number of cores on all the executer nodes in the Spark cluster. Suppose you are processing 10 GB on a Spark cluster (or supercomputing executor) that contains a total of 200 CPU cores, that means Spark might use 200 partitions, by default, to process your data.
Also, to make all your CPU cores work per each executer this can be solved in python (using 100% of all cores with the multiprocessing module).