Spark date format issue

限于喜欢 提交于 2020-08-02 05:33:27

问题


I have observed weird behavior in spark date formatting. Actually I need to convert the date yy to yyyy. After date conversion it should be 20yy in date

I have tried as below, it failing after 2040 year.

import org.apache.spark.sql.functions._
val df=   Seq(("06/03/35"),("07/24/40"), ("11/15/43"), ("12/15/12"), ("11/15/20"), ("12/12/22")).toDF("Date")

df.withColumn("newdate", from_unixtime(unix_timestamp($"Date", "mm/dd/yy"), "mm/dd/yyyy")).show

+--------+----------+
|    Date|   newdate|
+--------+----------+
| 06/3/35|06/03/2035|
|07/24/40|07/24/2040|
|11/15/43|11/15/1943|  // Here year appended with 19
|12/15/12|12/15/2012|
|11/15/20|11/15/2020|
|12/12/22|12/12/2022|
+--------+----------+

Why this behavior, Is there any date utility function that I can use directly without appending 20 to string date


回答1:


Parsing 2-digit year strings is subject to some relative interpretation that is documented in the SimpleDateFormat docs:

For parsing with the abbreviated year pattern ("y" or "yy"), SimpleDateFormat must interpret the abbreviated year relative to some century. It does this by adjusting dates to be within 80 years before and 20 years after the time the SimpleDateFormat instance is created. For example, using a pattern of "MM/dd/yy" and a SimpleDateFormat instance created on Jan 1, 1997, the string "01/11/12" would be interpreted as Jan 11, 2012 while the string "05/04/64" would be interpreted as May 4, 1964.

So, 2043 being more than 20 years away, the parser uses 1943 as documented.

Here's one approach that uses a UDF that explicitly calls set2DigitYearStart on a SimpleDateFormat object before parsing the date (I picked 1980 just as an example):

def parseDate(date: String, pattern: String): Date = {

    val format = new SimpleDateFormat(pattern);
    val cal = Calendar.getInstance();
    cal.set(Calendar.YEAR, 1980)
    val beginning = cal.getTime();

    format.set2DigitYearStart(beginning)

    return new Date(format.parse(date).getTime);
}

And then:

val custom_to_date = udf(parseDate _);
df.withColumn("newdate", custom_to_date($"Date", lit("mm/dd/yy"))).show(false)
+--------+----------+
|Date    |newdate   |
+--------+----------+
|06/03/35|2035-01-03|
|07/24/40|2040-01-24|
|11/15/43|2043-01-15|
|12/15/12|2012-01-15|
|11/15/20|2020-01-15|
|12/12/22|2022-01-12|
+--------+----------+

Knowing your data, you would know which value to pick for the parameter to set2DigitYearStart()



来源:https://stackoverflow.com/questions/60629893/spark-date-format-issue

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!