Append a column to Data Frame in Apache Spark 1.3

后端 未结 4 1643
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-11-27 13:41

Is it possible and what would be the most efficient neat method to add a column to Data Frame?

More specifically, column may serve as Row IDs for the existing Data

4条回答
  •  栀梦
    栀梦 (楼主)
    2020-11-27 14:23

    I took help from above answer. However, I find it incomplete if we want to change a DataFrame and current APIs are little different in Spark 1.6. zipWithIndex() returns a Tuple of (Row, Long) which contains each row and corresponding index. We can use it to create new Row according to our need.

    val rdd = df.rdd.zipWithIndex()
                 .map(indexedRow => Row.fromSeq(indexedRow._2.toString +: indexedRow._1.toSeq))
    val newstructure = StructType(Seq(StructField("Row number", StringType, true)).++(df.schema.fields))
    sqlContext.createDataFrame(rdd, newstructure ).show
    

    I hope this will be helpful.

提交回复
热议问题