How to do bind two dataframe columns in sparkR?

二次信任 提交于 2019-12-23 17:53:10

问题


How to bind two columns of dataframe in SparkR of spark 1.4

TIA, Arun


回答1:


There is no way to do this. Here is a question on spark (1.3) in scala. The only way to be able to do this, is having some kind of row.numbering, because then you are able to join on row.number. Why? Because you can only join tables or add columns based on other already existing columns

data1 <- createDataFrame(sqlContext, data.frame(a=c(1,2,3)))
data2 <- createDataFrame(sqlContext, data.frame(b=c(2,3,4)))

Then

withColumn(data1,"b",data1$a + 1)

is allowed, but

withColumn(data1,"b",data2$b)

is not. From the moment Spark cuts your DataFrame in blocks to store it, it has no idea how to bind them (it has no idea of a row sequencing), only when you have row.numbers.



来源:https://stackoverflow.com/questions/31589222/how-to-do-bind-two-dataframe-columns-in-sparkr

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!