Add a column full of NAs in Sparkr

问题

How do I add a column full of NA in a SparkR DataFrame ? This doesn't work.

> df <- data.frame(cola = 1:4)
> sprkrDF <- createDataFrame(sqlContext, df)
> sprkrDF$colb <- NA
Error: class(value) == "Column" || is.null(value) is not TRUE

Thanks

NB : I want to add it directly to the SparkR DataFrame, so this is not the solution I'm looking for :

> df <- data.frame(cola = 1:4, colb = NA)
> sprkrDF <- createDataFrame(sqlContext, df)

回答1:

We could use lit() to create a new column and fill it with NA's.

sprkrDF <- withColumn(sprkrDF, "colb", lit(NULL))

回答2:

Agreed that @mtoto 's answer is the right answer for the specific question you asked. An alternative approach is to populate the NA values in the R data.frame before you create the Spark DataFrame. Working in base R can make some tasks easier when (a) you don't need distributed processing power and (b) you want to index specific rows in the data.

df <- data.frame(cola = 1:4)
df$colb <- NA
sprkrDF <- createDataFrame(sqlContext, df)

Glad to see that someone else has learned to prefix R and Spark dataframe names clearly! ... I always use rdf for "R data.frame" and sdf for "Spark DataFrame" to make my code more readable :-)

来源：https://stackoverflow.com/questions/37812295/add-a-column-full-of-nas-in-sparkr

标签

sparkr

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!