Column filtering in PySpark

前端 未结 2 1879
时光取名叫无心
时光取名叫无心 2021-01-31 06:39

I have a dataframe df loaded from Hive table and it has a timestamp column, say ts, with string type of format dd-MMM-yy hh.mm.ss.MS a (co

2条回答
  •  自闭症患者
    2021-01-31 07:11

    from pyspark.sql.functions import *
    df.withColumn("seconds_from_now", current_timestamp() - col("ts").cast("long"))
    df = df.filter(df.seconds_from_now <= 5*60).drop("seconds_from_now")
    

    df is the result dataframe with contained results of last five minutes.

提交回复
热议问题