How to use spark_apply to change NaN values?

问题

After using sdf_pivot I was left with a huge number of NaN values, so in order to proceed with my analysis I need to replace the NaN with 0, I have tried using this:

data <- data %>% 
  spark_apply(function(e) ifelse(is.nan(e),0,e))

And this gererates the following error:

Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
  cannot open file 
'C:\.........\file18dc5a1c212e_spark.log':Permission denied

I'm using Spark 2.2.0 and the latest version of sparklyr

Does anyone have an idea on how to fix this issue? Thanks

回答1:

You seem to have two different problems here.

Permissions issues. Make sure that you have required permissions and correctly use winutils if necessary.
NULL replacement.

The latter one can solved using built-in functions and there is no need for inefficient spark_apply:

df <- copy_to(sc, 
  data.frame(id=c(1, 1, 2, 3), key=c("a", "b", "a", "d"), value=1:4))

pivoted <- sdf_pivot(df, id ~ key)
pivoted

# Source:   table<sparklyr_tmp_f0550e429aa> [?? x 4]
# Database: spark_connection
     id     a     b     d
  <dbl> <dbl> <dbl> <dbl>
1     1     1     1   NaN
2     3   NaN   NaN     1
3     2     1   NaN   NaN

pivoted %>% na.replace(0)

# Source:   table<sparklyr_tmp_f0577e16bf1> [?? x 4]
# Database: spark_connection
     id     a     b     d
  <dbl> <dbl> <dbl> <dbl>
1     1     1     1     0
2     3     0     0     1
3     2     1     0     0

Tested with sparklyr 0.7.0-9105.

来源：https://stackoverflow.com/questions/47699198/how-to-use-spark-apply-to-change-nan-values

标签

apache-spark

nan

sparklyr

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!