Find minimum for a timestamp through Spark groupBy dataframe

前端 未结 1 427
情歌与酒
情歌与酒 2021-01-04 22:29

When I try to group my dataframe on a column then try to find the minimum for each grouping groupbyDatafram.min(\'timestampCol\') it appears I cannot do it on n

相关标签:
1条回答
  • 2021-01-04 23:28

    Just perform aggregation directly instead of using min helper:

    import org.apache.spark.sql.functions.min
    
    val sqlContext: SQLContext = ???
    
    import sqlContext.implicits._
    
    val df = Seq((1L, "2016-04-05 15:10:00"), (1L, "2014-01-01 15:10:00"))
      .toDF("id", "ts")
      .withColumn("ts", $"ts".cast("timestamp"))
    
    df.groupBy($"id").agg(min($"ts")).show
    
    // +---+--------------------+
    // | id|             min(ts)|
    // +---+--------------------+
    // |  1|2014-01-01 15:10:...|
    // +---+--------------------+
    

    Unlike min it will work on any Orderable type.

    0 讨论(0)
提交回复
热议问题