I have a PySpark dataframe that includes timestamps in a column (call the column \'dt\'), like this:
2018-04-07 16:46:00
2018-03-06 22:18:00
<
For spark <= 2.2.0
Please use this:
from pyspark.sql.functions import weekofyear, year, to_date, concat, lit, col
from pyspark.sql.session import SparkSession
from pyspark.sql.types import TimestampType
spark = SparkSession.builder.getOrCreate()
spark.createDataFrame([['2020-10-03 05:00:00']], schema=['timestamp']) \
.withColumn('timestamp', col('timestamp').astype(TimestampType())) \
.withColumn('date', to_date('timestamp').astype(TimestampType())) \
.show(truncate=False)
+-------------------+-------------------+
|timestamp |date |
+-------------------+-------------------+
|2020-10-03 05:00:00|2020-10-03 00:00:00|
+-------------------+-------------------+
For spark > 2.2.0 datetime patterns in spark 3.0.0
from pyspark.sql.functions import date_trunc, col
from pyspark.sql.session import SparkSession
from pyspark.sql.types import TimestampType
spark = SparkSession.builder.getOrCreate()
spark.createDataFrame([['2020-10-03 05:00:00']], schema=['timestamp']) \
.withColumn('timestamp', col('timestamp').astype(TimestampType())) \
.withColumn('date', date_trunc(timestamp='timestamp', format='day')) \
.show(truncate=False)
+-------------------+-------------------+
|timestamp |date |
+-------------------+-------------------+
|2020-10-03 05:00:00|2020-10-03 00:00:00|
+-------------------+-------------------+