问题
How to bind two columns of dataframe in SparkR of spark 1.4
TIA, Arun
回答1:
There is no way to do this. Here is a question on spark (1.3) in scala. The only way to be able to do this, is having some kind of row.numbering, because then you are able to join on row.number. Why? Because you can only join tables or add columns based on other already existing columns
data1 <- createDataFrame(sqlContext, data.frame(a=c(1,2,3)))
data2 <- createDataFrame(sqlContext, data.frame(b=c(2,3,4)))
Then
withColumn(data1,"b",data1$a + 1)
is allowed, but
withColumn(data1,"b",data2$b)
is not. From the moment Spark cuts your DataFrame in blocks to store it, it has no idea how to bind them (it has no idea of a row sequencing), only when you have row.numbers.
来源:https://stackoverflow.com/questions/31589222/how-to-do-bind-two-dataframe-columns-in-sparkr