Spark Dataframe :How to add a index Column : Aka Distributed Data Index

后端 未结 7 2248
我寻月下人不归
我寻月下人不归 2020-11-27 18:49

I read data from a csv file ,but don\'t have index.

I want to add a column from 1 to row\'s number.

What should I do,Thanks (scala)

7条回答
  •  误落风尘
    2020-11-27 19:22

    monotonically_increasing_id - The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive.

    "I want to add a column from 1 to row's number."

    Let say we have the following DF

    +--------+-------------+-------+
    | userId | productCode | count |
    +--------+-------------+-------+
    |     25 |        6001 |     2 |
    |     11 |        5001 |     8 |
    |     23 |         123 |     5 |
    +--------+-------------+-------+
    

    To generate the IDs starting from 1

    val w = Window.orderBy("count")
    val result = df.withColumn("index", row_number().over(w))
    

    This would add an index column ordered by increasing value of count.

    +--------+-------------+-------+-------+
    | userId | productCode | count | index |
    +--------+-------------+-------+-------+
    |     25 |        6001 |     2 |     1 |
    |     23 |         123 |     5 |     2 |
    |     11 |        5001 |     8 |     3 |
    +--------+-------------+-------+-------+
    

提交回复
热议问题