问题
I am trying to figure out, how to extract a date from a datetime value using Pyspark sql.
The datetime values look like this:
DateTime
2018-05-21T00:00:00.000-04:00
2016-02-22T02:00:02.234-06:00
When I now load this into a spark dataframe and try to extract the date (via
Date() or
Timestamp() and then Date()
I always get the error, that a date or timestamp value is expected, but a DateTime value was provided.
Can someone help me with retrieving the date from this value? I think, you need to provide a timezone for that - but since I already had problems extracting only the date, I first wanted to solve this.
Thank you and kind regards.
回答1:
Pyspark has a to_date
function to extract the date from a timestamp. In your example you could create a new column with just the date by doing the following:
df = df.withColumn("date_only", func.to_date(func.col("DateTime")))
If the column you are trying to convert is a string you can set the format
parameter of to_date
specifying the datetime format of the string.
You can read more about to_date
in the documentation here.
回答2:
You can use either date_format (or) from_unixtime (or) to_date functions to extract date from the input string.
Example:
Input data df data as follows..
>>> df.show(10,False)
+-----------------------------+
|ts |
+-----------------------------+
|2018-05-21T00:00:00.000-04:00|
|2016-02-22T02:00:02.234-06:00|
+-----------------------------+
1. Using date_format function:
>>> df.select(date_format(col('ts'),"yyyy-MM-dd").alias('ts').cast("date")).show(10,False)
+----------+
|ts |
+----------+
|2018-05-21|
|2016-02-22|
+----------+
2. Using to_date function:
>>> df.select(to_date(col('ts')).alias('ts').cast("date")).show(10,False)
+----------+
|ts |
+----------+
|2018-05-21|
|2016-02-22|
+----------+
3. Using from_unixtime and unix_timestamp functions:
>>> df.select(from_unixtime(unix_timestamp(col('ts'),"yyyy-MM-dd'T'HH:mm:ss.SSS"),"yyyy-MM-dd").alias("ts").cast("date")).show(10,False)
+----------+
|ts |
+----------+
|2018-05-21|
|2016-02-22|
+----------+
来源:https://stackoverflow.com/questions/51880774/pyspark-extract-date-from-datetime-value