How to calculate date difference in pyspark?

前端 未结 2 1392
忘了有多久
忘了有多久 2020-12-10 01:14

I have data like this:

df = sqlContext.createDataFrame([
    (\'1986/10/15\', \'z\', \'null\'), 
    (\'1986/10/15\', \'z\', \'null\'),
    (\'1986/10/15\'         


        
2条回答
  •  借酒劲吻你
    2020-12-10 01:46

    You need to cast the column low to class date and then you can use datediff() in combination with lit(). Using Spark 2.2:

    from pyspark.sql.functions import datediff, to_date, lit
    
    df.withColumn("test", 
                  datediff(to_date(lit("2017-05-02")),
                           to_date("low","yyyy/MM/dd"))).show()
    +----------+----+------+-----+
    |       low|high|normal| test|
    +----------+----+------+-----+
    |1986/10/15|   z|  null|11157|
    |1986/10/15|   z|  null|11157|
    |1986/10/15|   c|  null|11157|
    |1986/10/15|null|  null|11157|
    |1986/10/16|null|   4.0|11156|
    +----------+----+------+-----+
    

    Using < Spark 2.2, we need to convert the the low column to class timestamp first:

    from pyspark.sql.functions import datediff, to_date, lit, unix_timestamp
    
    df.withColumn("test", 
                  datediff(to_date(lit("2017-05-02")),
                           to_date(unix_timestamp('low', "yyyy/MM/dd").cast("timestamp")))).show()
    

提交回复
热议问题