发表新帖

发表新帖

How to select last row and also how to access PySpark dataframe by index?

后端未结

关注

 4  1101

眼角桃花 2020-12-10 12:27

From a PySpark SQL dataframe like

name age city
abc   20  A
def   30  B

How to get the last row.(Like by df.limit(1) I can get first row o

4条回答

春和景丽 (楼主)

2020-12-10 12:59
How to get the last row.

Long and ugly way which assumes that all columns are oderable:
```
from pyspark.sql.functions import (
    col, max as max_, struct, monotonically_increasing_id
)

last_row = (df
    .withColumn("_id", monotonically_increasing_id())
    .select(max(struct("_id", *df.columns))
    .alias("tmp")).select(col("tmp.*"))
    .drop("_id"))
```
If not all columns can be order you can try:
```
with_id = df.withColumn("_id", monotonically_increasing_id())
i = with_id.select(max_("_id")).first()[0]

with_id.where(col("_id") == i).drop("_id")
```
Note. There is last function in pyspark.sql.functions/ `o.a.s.sql.functions but considering description of the corresponding expressions it is not a good choice here.

how can I access the dataframe rows by index.like

You cannot. Spark DataFrame and accessible by index. You can add indices using zipWithIndex and filter later. Just keep in mind this O(N) operation.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

热议问题