Extract date from a string column containing timestamp in Pyspark

你。 提交于 2019-12-04 12:48:55

问题


I have a dataframe which has a date in the following format:

+----------------------+
|date                  |
+----------------------+
|May 6, 2016 5:59:34 AM|
+----------------------+

I intend to extract the date from this in the format YYYY-MM-DD ; so the result should be for the above date - 2016-05-06.

But when I extract is using the following:

df.withColumn('part_date', from_unixtime(unix_timestamp(df.date, "MMM dd, YYYY hh:mm:ss aa"), "yyyy-MM-dd"))

I get the following date

2015-12-27

Can anyone please advise on this? I do not intend to convert my df to rdd to use datetime function from python and want to use this in the dataframe it self.


回答1:


There are some errors with your pattern. Here's a suggestion:

from_pattern = 'MMM d, yyyy h:mm:ss aa'
to_pattern = 'yyyy-MM-dd'
df.withColumn('part_date', from_unixtime(unix_timestamp(df['date'], from_pattern), to_pattern)).show()
+----------------------+----------+
|date                  |part_date |
+----------------------+----------+
|May 6, 2016 5:59:34 AM|2016-05-06|
+----------------------+----------+


来源:https://stackoverflow.com/questions/37330866/extract-date-from-a-string-column-containing-timestamp-in-pyspark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!