发表新帖

发表新帖

Append a column to Data Frame in Apache Spark 1.3

后端未结

关注

 4  1643

爱一瞬间的悲伤 2020-11-27 13:41

Is it possible and what would be the most efficient neat method to add a column to Data Frame?

More specifically, column may serve as Row IDs for the existing Data

4条回答

栀梦 (楼主)

2020-11-27 14:23
I took help from above answer. However, I find it incomplete if we want to change a DataFrame and current APIs are little different in Spark 1.6. zipWithIndex() returns a Tuple of (Row, Long) which contains each row and corresponding index. We can use it to create new Row according to our need.
```
val rdd = df.rdd.zipWithIndex()
             .map(indexedRow => Row.fromSeq(indexedRow._2.toString +: indexedRow._1.toSeq))
val newstructure = StructType(Seq(StructField("Row number", StringType, true)).++(df.schema.fields))
sqlContext.createDataFrame(rdd, newstructure ).show
```
I hope this will be helpful.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

热议问题