Spark Dataframe :How to add a index Column : Aka Distributed Data Index

后端 未结 7 2250
我寻月下人不归
我寻月下人不归 2020-11-27 18:49

I read data from a csv file ,but don\'t have index.

I want to add a column from 1 to row\'s number.

What should I do,Thanks (scala)

7条回答
  •  无人及你
    2020-11-27 18:55

    With Scala you can use:

    import org.apache.spark.sql.functions._ 
    
    df.withColumn("id",monotonicallyIncreasingId)
    

    You can refer to this exemple and scala docs.

    With Pyspark you can use:

    from pyspark.sql.functions import monotonically_increasing_id 
    
    df_index = df.select("*").withColumn("id", monotonically_increasing_id())
    

提交回复
热议问题