Spark timestamp difference

て烟熏妆下的殇ゞ 提交于 2019-12-23 19:42:18

问题


I am trying to do a timestamp difference in Spark and it is not working as expected.

Below is how I'm trying to

import org.apache.spark.sql.functions.*
df = df.withColumn("TimeStampDiff", from_unixtime(unix_timestamp(df.col("TimeStampHigh"), "HH:mm:ss").minus(unix_timestamp(df.col("TimeStampLow"), "HH:mm:ss")),"HH:mm:ss"))

Values

TimeStampHigh - 15:57:01
TimeStampLow - 00:11:57

It returns me a result of 10:45:04 Expected output - 15:45:04

My other alternative is to go to an UDF with Java implementation.

Any pointers will help.


回答1:


That's because from_unixtime (emphasis mine):

Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the given format.

Clearly your system or JVM is not configured to use UTC time.

You should do one of the following:

  • Configure JVM to use appropriate time zone (-Duser.timezone=UTC for both spark.executor.extraJavaOptions and spark.driver.extraJavaOptions).
  • Set spark.sql.session.timeZone to use appropriate time zone.

Example:

scala> val df = Seq(("15:57:01", "00:11:57")).toDF("TimeStampHigh", "TimeStampLow")
df: org.apache.spark.sql.DataFrame = [TimeStampHigh: string, TimeStampLow: string]

scala> spark.conf.set("spark.sql.session.timeZone", "GMT-5")  // Equivalent to your current settings

scala> df.withColumn("TimeStampDiff", from_unixtime(unix_timestamp(df.col("TimeStampHigh"), "HH:mm:ss").minus(unix_timestamp(df.col("TimeStampLow"), "HH:mm:ss")),"HH:mm:ss")).show
+-------------+------------+-------------+
|TimeStampHigh|TimeStampLow|TimeStampDiff|
+-------------+------------+-------------+
|     15:57:01|    00:11:57|     10:45:04|
+-------------+------------+-------------+


scala> spark.conf.set("spark.sql.session.timeZone", "UTC")  // With UTC

scala> df.withColumn("TimeStampDiff", from_unixtime(unix_timestamp(df.col("TimeStampHigh"), "HH:mm:ss").minus(unix_timestamp(df.col("TimeStampLow"), "HH:mm:ss")),"HH:mm:ss")).show
+-------------+------------+-------------+
|TimeStampHigh|TimeStampLow|TimeStampDiff|
+-------------+------------+-------------+
|     15:57:01|    00:11:57|     15:45:04|
+-------------+------------+-------------+


来源:https://stackoverflow.com/questions/49878598/spark-timestamp-difference

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!