Getting correct offset for timezone using current_timestamp in apache spark

 ̄綄美尐妖づ 提交于 2021-01-29 16:18:27

问题


I am new to both Java and Apache spark and trying to understand the timestamp and timezone usage. I would like all the timestamps to be stored in EST timezone in SQL Server from data i get from apache spark DF.

When I use current_timestamp, I am getting the correct EST time but the offset i am getting when i look at data is '+00:00' instead of '-04:00'.

Here is a value stored in database that is passed in from spark dataset: 2020-04-07 11:36:23.0220 +00:00

From what I see current_timestamp does not accept any timezone. Moreover, the time is correct (it is in EST) but i don't understand why the offset is wrong.

Any help to understand this would be great.


回答1:


Java Timestamps work more or less as LocalDateTime in Java - they don't contain timezone information. And the database is interpreting this as UTC timestamp. That's why you got a mismatch. I usually use two approaches (depending what suits better)

  1. You can return UTC timestamp from Spark (with simple custom UDF) instead of using current_timestamp which is timezone specific.
  2. You can encode your dates as Strings - similarly, using java.time API you can achieve that with simple udf

Hope things are a bit clearer now.



来源:https://stackoverflow.com/questions/61084442/getting-correct-offset-for-timezone-using-current-timestamp-in-apache-spark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!