Apache Spark - how to set timezone to UTC? currently defaulted to Zulu

笑着哭i 提交于 2020-06-07 06:08:23

问题


In Spark's WebUI (port 8080) and on the environment tab there is a setting of the below:

user.timezone Zulu

Do you know how/where I can override this to UTC?

Env details:

  • Spark 2.1.1
  • jre-1.8.0-openjdk.x86_64
  • no jdk
  • EC2 Amazon Linux

EDIT (someone answered the below then deleted): https://www.timeanddate.com/time/zones/z


回答1:


Change your system timezone and check it I hope it will works




回答2:


Now you can use:

spark.conf.set("spark.sql.session.timeZone", "UTC")

Since https://issues.apache.org/jira/browse/SPARK-18936 in 2.2.0

EDIT:

Additionally I set my default TimeZone to UTC to avoid implicit conversions

TimeZone.setDefault(TimeZone.getTimeZone("UTC"))

Otherwise you will get implicit conversions from your default Timezone to UTC when no Timezone information is present in the Timestamp you're converting

Example:

val rawJson = """ {"some_date_field": "2018-09-14 16:05:37"} """

val dsRaw = sparkJob.spark.createDataset(Seq(rawJson))

val output =
  dsRaw
    .select(
      from_json(
        col("value"),
        new StructType(
          Array(
            StructField("some_date_field", DataTypes.TimestampType)
          )
        )
      ).as("parsed")
    ).select("parsed.*")

If my default TimeZone is Europe/Dublin which is GMT+1 and Spark sql session timezone is set to UTC, Spark will assume that "2018-09-14 16:05:37" is in Europe/Dublin TimeZone and do a conversion (result will be "2018-09-14 15:05:37")




回答3:


In some cases you will also want to set the JVM timezone. For example, when loading data into a TimestampType column, it will interpret the string in the local JVM timezone. To set the JVM timezone you will need to add extra JVM options for the driver and executor:

spark = pyspark.sql.SparkSession \
    .Builder()\
    .appName('test') \
    .master('local') \
    .config('spark.driver.extraJavaOptions', '-Duser.timezone=GMT') \
    .config('spark.executor.extraJavaOptions', '-Duser.timezone=GMT') \
    .config('spark.sql.session.timeZone', 'UTC') \
    .getOrCreate()

We do this in our local unit test environment, since our local time is not GMT.

Useful reference: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones



来源:https://stackoverflow.com/questions/49644232/apache-spark-how-to-set-timezone-to-utc-currently-defaulted-to-zulu

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!