Calculating duration by subtracting two datetime columns in string format

后端 未结 6 1807
盖世英雄少女心
盖世英雄少女心 2020-12-04 15:40

I have a Spark Dataframe in that consists of a series of dates:

from pyspark.sql import SQLContext
from pyspark.sql import Row
from pyspark.sql.types import          


        
6条回答
  •  心在旅途
    2020-12-04 16:16

    This can be done in spark-sql by converting the string date to timestamp and then getting the difference.

    1: Convert to timestamp:

    CAST(UNIX_TIMESTAMP(MY_COL_NAME,'dd-MMM-yy') as TIMESTAMP
    

    2: Get the difference between dates using datediff function.

    This will be combined in a nested function like:

    spark.sql("select COL_1, COL_2, datediff( CAST( UNIX_TIMESTAMP( COL_1,'dd-MMM-yy') as TIMESTAMP), CAST( UNIX_TIMESTAMP( COL_2,'dd-MMM-yy') as TIMESTAMP) ) as LAG_in_days from MyTable")
    

    Below is the result:

    +---------+---------+-----------+
    |    COL_1|    COL_2|LAG_in_days|
    +---------+---------+-----------+
    |24-JAN-17|16-JAN-17|          8|
    |19-JAN-05|18-JAN-05|          1|
    |23-MAY-06|23-MAY-06|          0|
    |18-AUG-06|17-AUG-06|          1|
    +---------+---------+-----------+
    

    Reference: https://docs-snaplogic.atlassian.net/wiki/spaces/SD/pages/2458071/Date+Functions+and+Properties+Spark+SQL

提交回复
热议问题