Using apply functions in SparkR

旧城冷巷雨未停 提交于 2019-12-10 03:53:55

问题


I am currently trying to implement some functions using sparkR version 1.5.1. I have seen older (version 1.3) examples, where people used the apply function on DataFrames, but it looks like this is no longer directly available. Example:

x = c(1,2)
xDF_R = data.frame(x)
colnames(xDF_R) = c("number")
xDF_S = createDataFrame(sqlContext,xDF_R)

Now, I can use the function sapply on the data.frame object

xDF_R$result = sapply(xDF_R$number, ppois, q=10)

When I use a similar logic on the DataFrame

xDF_S$result = sapply(xDF_S$number, ppois, q=10)

I get the error message "Error in as.list.default(X) : no method for coercing this S4 class to a vector"

Can I somehow do this?


回答1:


This is possible with user defined functions in Spark 2.0.

wrapper = function(df){
+     out = df
+     out$result = sapply(df$number, ppois, q=10)
+     return(out)
+ }
> xDF_S2 = dapplyCollect(xDF_S, wrapper)
> identical(xDF_S2, xDF_R)
[1] TRUE

Note you need a wrapper function like this because you can't pass the extra arguments in directly, but that may change in the future.



来源:https://stackoverflow.com/questions/33286030/using-apply-functions-in-sparkr

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!