Column filtering in PySpark

前端未结

关注

 2  1879

时光取名叫无心 2021-01-31 06:39

I have a dataframe df loaded from Hive table and it has a timestamp column, say ts, with string type of format dd-MMM-yy hh.mm.ss.MS a (co

2条回答

自闭症患者 (楼主)

2021-01-31 07:11

from pyspark.sql.functions import *
df.withColumn("seconds_from_now", current_timestamp() - col("ts").cast("long"))
df = df.filter(df.seconds_from_now <= 5*60).drop("seconds_from_now")

df is the result dataframe with contained results of last five minutes.

0 讨论(0)

查看其它2个回答