How to convert date to the first day of month in a PySpark Dataframe column?

此生再无相见时 提交于 2020-05-23 12:50:26

问题


I have the following DataFrame:

+----------+
|      date|
+----------+
|2017-01-25|
|2017-01-21|
|2017-01-12|
+----------+

Here is the code the create above DataFrame:

import pyspark.sql.functions as f
rdd = sc.parallelize([("2017/11/25",), ("2017/12/21",), ("2017/09/12",)])
df = sqlContext.createDataFrame(rdd, ["date"]).withColumn("date", f.to_date(f.col("date"), "yyyy/MM/dd"))
df.show()

I want a new column with the first date of month for each row, just replace the day to "01" in all the dates

+----------++----------+
|      date| first_date|
+----------++----------+
|2017-11-25| 2017-11-01|
|2017-12-21| 2017-12-01|
|2017-09-12| 2017-09-01|
+----------+-----------+

There is a last_day function in PySpark.sql.function, however, there is no first_day function.

I tried using date_sub to do this but did not work: I get a column not Iterable error because the second argument to date_sub cannot be a column and has to be an integer.

f.date_sub(f.col('date'), f.dayofmonth(f.col('date')) - 1 )

回答1:


You can use trunc:

df.withColumn("first_date", f.trunc("date", "month")).show()

+----------+----------+
|      date|first_date|
+----------+----------+
|2017-11-25|2017-11-01|
|2017-12-21|2017-12-01|
|2017-09-12|2017-09-01|
+----------+----------+



回答2:


I suppose it is syntactical error, Can you please change f.dayofmonth -> dayofmonth and try. Expression looks fine.

f.date_sub(f.col('Match_date'),dayofmonth(f.col('Match_date')) - 1 ) 


来源:https://stackoverflow.com/questions/48349048/how-to-convert-date-to-the-first-day-of-month-in-a-pyspark-dataframe-column

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!