发表新帖

发表新帖

Spark Dataframe :How to add a index Column : Aka Distributed Data Index

后端未结

关注

 7  2185

我寻月下人不归 2020-11-27 18:49

I read data from a csv file ,but don\'t have index.

I want to add a column from 1 to row\'s number.

What should I do,Thanks (scala)

7条回答

北海茫月 (楼主)

2020-11-27 19:04
How to get a sequential id column id[1, 2, 3, 4...n]:
```
from pyspark.sql.functions import desc, row_number, monotonically_increasing_id
from pyspark.sql.window import Window

df_with_seq_id = df.withColumn('index_column_name', row_number().over(Window.orderBy(monotonically_increasing_id())) - 1)
```
Note that row_number() starts at 1, therefore subtract by 1 if you want 0-indexed column
0 讨论(0)

查看其它7个回答
发布评论:

提交评论
- 加载中...

热议问题