From a PySpark SQL dataframe like
name age city
abc 20 A
def 30 B
How to get the last row.(Like by df.limit(1) I can get first row o
How to get the last row.
Long and ugly way which assumes that all columns are oderable:
from pyspark.sql.functions import (
col, max as max_, struct, monotonically_increasing_id
)
last_row = (df
.withColumn("_id", monotonically_increasing_id())
.select(max(struct("_id", *df.columns))
.alias("tmp")).select(col("tmp.*"))
.drop("_id"))
If not all columns can be order you can try:
with_id = df.withColumn("_id", monotonically_increasing_id())
i = with_id.select(max_("_id")).first()[0]
with_id.where(col("_id") == i).drop("_id")
Note. There is last
function in pyspark.sql.functions
/ `o.a.s.sql.functions but considering description of the corresponding expressions it is not a good choice here.
how can I access the dataframe rows by index.like
You cannot. Spark DataFrame
and accessible by index. You can add indices using zipWithIndex and filter later. Just keep in mind this O(N) operation.