How do I truncate a PySpark dataframe of timestamp type to the day?

后端 未结 3 892
悲&欢浪女
悲&欢浪女 2021-01-12 13:01

I have a PySpark dataframe that includes timestamps in a column (call the column \'dt\'), like this:

2018-04-07 16:46:00
2018-03-06 22:18:00
<
3条回答
  •  甜味超标
    2021-01-12 13:59

    For spark <= 2.2.0

    Please use this:

    from pyspark.sql.functions import weekofyear, year, to_date, concat, lit, col
    from pyspark.sql.session import SparkSession
    from pyspark.sql.types import TimestampType
    
    spark = SparkSession.builder.getOrCreate()
    
    spark.createDataFrame([['2020-10-03 05:00:00']], schema=['timestamp']) \
        .withColumn('timestamp', col('timestamp').astype(TimestampType())) \
        .withColumn('date', to_date('timestamp').astype(TimestampType())) \
        .show(truncate=False)
    
    +-------------------+-------------------+
    |timestamp          |date               |
    +-------------------+-------------------+
    |2020-10-03 05:00:00|2020-10-03 00:00:00|
    +-------------------+-------------------+
    

    For spark > 2.2.0 datetime patterns in spark 3.0.0

    from pyspark.sql.functions import date_trunc, col
    from pyspark.sql.session import SparkSession
    from pyspark.sql.types import TimestampType
    
    spark = SparkSession.builder.getOrCreate()
    
    spark.createDataFrame([['2020-10-03 05:00:00']], schema=['timestamp']) \
        .withColumn('timestamp', col('timestamp').astype(TimestampType())) \
        .withColumn('date', date_trunc(timestamp='timestamp', format='day')) \
        .show(truncate=False)
    
    +-------------------+-------------------+
    |timestamp          |date               |
    +-------------------+-------------------+
    |2020-10-03 05:00:00|2020-10-03 00:00:00|
    +-------------------+-------------------+
    

提交回复
热议问题