I have a dataframe df loaded from Hive table and it has a timestamp column, say ts, with string type of format dd-MMM-yy hh.mm.ss.MS a (co
df
ts
dd-MMM-yy hh.mm.ss.MS a
from pyspark.sql.functions import * df.withColumn("seconds_from_now", current_timestamp() - col("ts").cast("long")) df = df.filter(df.seconds_from_now <= 5*60).drop("seconds_from_now")
df is the result dataframe with contained results of last five minutes.