How to pass variables to functions called in spark_apply()?

痴心易碎 提交于 2019-12-06 13:57:07

问题


I would like to be able to pass extra variables to functions that are called by spark_apply in sparklyr.

For example:

# setup
library(sparklyr)
sc <- spark_connect(master='local', packages=TRUE)
iris2 <- iris[,1:(ncol(iris) - 1)]
df1 <- sdf_copy_to(sc, iris2, repartition=5, overwrite=T)

# This works fine
res <- spark_apply(df1, function(x) kmeans(x, 3)$centers)

# This does not
k <- 3
res <- spark_apply(df1, function(x) kmeans(x, k)$centers)

As an ugly workaround, I can do what I want by saving values into R packages, and then referencing them. i.e

> myPackage::k_equals_three == 3
[1] TRUE

# This also works
res <- spark_apply(df1, function(x) kmeans(x, myPackage::k_equals_three)$centers)

Is there a better way to do this?


回答1:


I don't have spark set up to test, but can you just create a closure?

kmeanswithk <- function(k) {force(k); function(x) kmeans(x, k)$centers})
k <- 3
res <- spark_apply(df1, kmeanswithk(k))

Basically just create a function to return a function then use that.




回答2:


spark_apply() now has a context argument for you to pass additional objects/variables/etc to the environment.

res <- spark_apply(df1, function(x, k) {
  kmeans(x, k)$cluster},
  context = {k <- 3})

or

k <- 3
res <- spark_apply(df1, function(x, k) {
  kmeans(x, k)$cluster},
  context = {k})

The R documentation does not include any examples with the context argument, but you might learn more from reading the PR: https://github.com/rstudio/sparklyr/pull/1107.



来源:https://stackoverflow.com/questions/46349921/how-to-pass-variables-to-functions-called-in-spark-apply

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!