should I pre-install cran r packages on worker nodes when using sparkr

前端 未结 3 1554
甜味超标
甜味超标 2021-01-14 15:42

I want to use r packages on cran such as forecast etc with sparkr and meet following two problems.

  1. Should I pre-install all those packages on w

3条回答
  •  长情又很酷
    2021-01-14 16:03

    It is boring to repeat this but you shouldn't use internal RDD API in the first place. It's been removed in the first official SparkR release and it is simply not suitable for general usage.

    Until new low level API* is ready (see for example SPARK-12922 SPARK-12919, SPARK-12792) I wouldn't consider Spark as a platform for running plain R code. Even when it changes adding native (Java / Scala) code with R wrappers can be a better choice.

    That being said lets start with your question:

    1. RPackageUtils are designed to handle packages create with Spark Packages in mind. It doesn't handle standard R libraries.

    2. Yes, you need packages to be installed on every node. From includePackage docstring:

      The package is assumed to be installed on every node in the Spark cluster.


    * If you use Spark 2.0+ you can use dapply, gapply and lapply functions.

提交回复
热议问题