问题
How do I add a column full of NA
in a SparkR
DataFrame
? This doesn't work.
> df <- data.frame(cola = 1:4)
> sprkrDF <- createDataFrame(sqlContext, df)
> sprkrDF$colb <- NA
Error: class(value) == "Column" || is.null(value) is not TRUE
Thanks
NB : I want to add it directly to the SparkR
DataFrame
, so this is not the solution I'm looking for :
> df <- data.frame(cola = 1:4, colb = NA)
> sprkrDF <- createDataFrame(sqlContext, df)
回答1:
We could use lit()
to create a new column and fill it with NA
's.
sprkrDF <- withColumn(sprkrDF, "colb", lit(NULL))
回答2:
Agreed that @mtoto 's answer is the right answer for the specific question you asked. An alternative approach is to populate the NA values in the R data.frame before you create the Spark DataFrame. Working in base R can make some tasks easier when (a) you don't need distributed processing power and (b) you want to index specific rows in the data.
df <- data.frame(cola = 1:4)
df$colb <- NA
sprkrDF <- createDataFrame(sqlContext, df)
Glad to see that someone else has learned to prefix R and Spark dataframe names clearly! ... I always use rdf for "R data.frame" and sdf for "Spark DataFrame" to make my code more readable :-)
来源:https://stackoverflow.com/questions/37812295/add-a-column-full-of-nas-in-sparkr