Spark Dataframe :How to add a index Column : Aka Distributed Data Index

后端 未结 7 2185
我寻月下人不归
我寻月下人不归 2020-11-27 18:49

I read data from a csv file ,but don\'t have index.

I want to add a column from 1 to row\'s number.

What should I do,Thanks (scala)

7条回答
  •  北海茫月
    2020-11-27 19:04

    How to get a sequential id column id[1, 2, 3, 4...n]:

    from pyspark.sql.functions import desc, row_number, monotonically_increasing_id
    from pyspark.sql.window import Window
    
    df_with_seq_id = df.withColumn('index_column_name', row_number().over(Window.orderBy(monotonically_increasing_id())) - 1)
    

    Note that row_number() starts at 1, therefore subtract by 1 if you want 0-indexed column

提交回复
热议问题