How to use spark_apply to change NaN values?

て烟熏妆下的殇ゞ 提交于 2019-12-11 06:59:48

问题


After using sdf_pivot I was left with a huge number of NaN values, so in order to proceed with my analysis I need to replace the NaN with 0, I have tried using this:

data <- data %>% 
  spark_apply(function(e) ifelse(is.nan(e),0,e))

And this gererates the following error:

Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
  cannot open file 
'C:\.........\file18dc5a1c212e_spark.log':Permission denied

I'm using Spark 2.2.0 and the latest version of sparklyr

Does anyone have an idea on how to fix this issue? Thanks


回答1:


You seem to have two different problems here.

  • Permissions issues. Make sure that you have required permissions and correctly use winutils if necessary.
  • NULL replacement.

The latter one can solved using built-in functions and there is no need for inefficient spark_apply:

df <- copy_to(sc, 
  data.frame(id=c(1, 1, 2, 3), key=c("a", "b", "a", "d"), value=1:4))

pivoted <- sdf_pivot(df, id ~ key)
pivoted
# Source:   table<sparklyr_tmp_f0550e429aa> [?? x 4]
# Database: spark_connection
     id     a     b     d
  <dbl> <dbl> <dbl> <dbl>
1     1     1     1   NaN
2     3   NaN   NaN     1
3     2     1   NaN   NaN
pivoted %>% na.replace(0)
# Source:   table<sparklyr_tmp_f0577e16bf1> [?? x 4]
# Database: spark_connection
     id     a     b     d
  <dbl> <dbl> <dbl> <dbl>
1     1     1     1     0
2     3     0     0     1
3     2     1     0     0

Tested with sparklyr 0.7.0-9105.



来源:https://stackoverflow.com/questions/47699198/how-to-use-spark-apply-to-change-nan-values

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!