How to convert a string column with milliseconds to a timestamp with milliseconds in Spark 2.1 using Scala?

后端 未结 3 588
日久生厌
日久生厌 2020-12-06 06:42

I am using Spark 2.1 with Scala.

How to convert a string column with milliseconds to a timestamp with milliseconds?

I tried the following code from the quest

相关标签:
3条回答
  • 2020-12-06 07:09

    There is an easier way than making a UDF. Just parse the millisecond data and add it to the unix timestamp (the following code works with pyspark and should be very close the scala equivalent):

    timeFmt = "yyyy/MM/dd HH:mm:ss.SSS"
    df = df.withColumn('ux_t', unix_timestamp(df.t, format=timeFmt) + substring(df.t, -3, 3).cast('float')/1000)
    

    Result: '2017/03/05 14:02:41.865' is converted to 1488722561.865

    0 讨论(0)
  • 2020-12-06 07:10
    import org.apache.spark.sql.functions;
    import org.apache.spark.sql.types.DataTypes;
    
    
    dataFrame.withColumn(
        "time_stamp", 
        dataFrame.col("milliseconds_in_string")
            .cast(DataTypes.LongType)
            .cast(DataTypes.TimestampType)
    )
    

    the code is in java and it is easy to convert to scala

    0 讨论(0)
  • 2020-12-06 07:28

    UDF with SimpleDateFormat works. The idea is taken from the Ram Ghadiyaram's link to an UDF logic.

    import java.text.SimpleDateFormat
    import java.sql.Timestamp
    import org.apache.spark.sql.functions.udf
    import scala.util.{Try, Success, Failure}
    
    val getTimestamp: (String => Option[Timestamp]) = s => s match {
      case "" => None
      case _ => {
        val format = new SimpleDateFormat("MM/dd/yyyy' 'HH:mm:ss.SSS")
        Try(new Timestamp(format.parse(s).getTime)) match {
          case Success(t) => Some(t)
          case Failure(_) => None
        }    
      }
    }
    
    val getTimestampUDF = udf(getTimestamp)
    val tdf = Seq((1L, "05/26/2016 01:01:01.601"), (2L, "#$@#@#")).toDF("id", "dts")
    val tts = getTimestampUDF($"dts")
    tdf.withColumn("ts", tts).show(2, false)
    

    with output:

    +---+-----------------------+-----------------------+
    |id |dts                    |ts                     |
    +---+-----------------------+-----------------------+
    |1  |05/26/2016 01:01:01.601|2016-05-26 01:01:01.601|
    |2  |#$@#@#                 |null                   |
    +---+-----------------------+-----------------------+
    
    0 讨论(0)
提交回复
热议问题