How to get year and week number aligned for a date

倖福魔咒の 提交于 2021-02-04 20:59:18

问题


While trying to get year and week number of a range of dates spanning multiple years, I am getting into some issues with the start/end of the year.

I understand the logic for weeknumber and the one of year when they run separately. However, when they are combined, in some cases they don't bring consistent results and I was wondering what is the best way in Spark to make sure that those scenarios are handled with a consistent year for the given weeknumber,

For example, running:

spark.sql("select year('2017-01-01') as year, weekofyear('2017-01-01') as weeknumber").show(false)

outputs:

+----+----------+
|year|weeknumber|
+----+----------+
|2017|52        |
+----+----------+

But the wanted output would be:

+----+----------+
|year|weeknumber|
+----+----------+
|2016|52        |
+----+----------+

and running:

spark.sql("select year('2018-12-31') as year, weekofyear('2018-12-31') as weeknumber").show(false)

produces:

+----+----------+
|year|weeknumber|
+----+----------+
|2018|1         |
+----+----------+

But what is expected is:

+----+----------+
|year|weeknumber|
+----+----------+
|2019|1         |
+----+----------+

Code is running on Spark 2.4.2.


回答1:


This spark behavior is consistent with the ISO 8601 definition. You can not change it. However there is a workaround I could think of.

You can first determine dayOfWeek, and if it is less than 4, you increase the year by one, if it equals to 4 then keep the year untouched. Otherwise decrease the year by one.

Example with 2017-01-01

sql("select case when date_format('2017-01-01', 'u') < 4 then year('2017-01-01')+1 when date_format('2017-01-01', 'u') = 4 then year('2017-01-01') else year('2017-01-01')- 1 end as year, weekofyear('2017-01-01') as weeknumber, date_format('2017-01-01', 'u') as dayOfWeek").show(false)

+----+----------+---------+
|year|weeknumber|dayOfWeek|
+----+----------+---------+
|2016|52        |7        |
+----+----------+---------+

Example with 2018-12-31

sql("select case when date_format('2018-12-31', 'u') < 4 then year('2018-12-31')+1 when date_format('2018-12-31', 'u') = 4 then year('2018-12-31') else year('2018-12-31')- 1 end as year, weekofyear('2018-12-31') as weeknumber, date_format('2018-12-31', 'u') as dayOfWeek").show(false)

+----+----------+---------+
|year|weeknumber|dayOfWeek|
+----+----------+---------+
|2019|1         |1        |
+----+----------+---------+



回答2:


val df = Seq(("2017-01-01"), ("2018-12-31")).toDF("dateval")

+----------+
|   dateval|
+----------+
|2017-01-01|
|2018-12-31|
+----------+

df.createOrReplaceTempView("date_tab")

val newDF = spark.sql("""select dateval, 
                    case when weekofyear(dateval)=1 and month(dateval)=12 then struct((year(dateval)+1) as yr, weekofyear(dateval) as wk)
                         when weekofyear(dateval)=52 and month(dateval)=1 then struct((year(dateval)-1) as yr, weekofyear(dateval) as wk)
                         else struct((year(dateval)) as yr, weekofyear(dateval) as wk) end as week_struct
             from date_tab""");

newDF.select($"dateval", $"week_struct.yr", $"week_struct.wk").show()

+----------+----+---+
|   dateval|  yr| wk|
+----------+----+---+
|2017-01-01|2016| 52|
|2018-12-31|2019|  1|
+----------+----+---+


来源:https://stackoverflow.com/questions/62968801/how-to-get-year-and-week-number-aligned-for-a-date

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!