get min and max from a specific column scala spark dataframe

后端 未结 7 1165
梦谈多话
梦谈多话 2021-02-01 04:37

I would like to access to the min and max of a specific column from my dataframe but I don\'t have the header of the column, just its number, so I should I do using scala ?

7条回答
  •  甜味超标
    2021-02-01 04:58

    Here is a direct way to get the min and max from a dataframe with column names:

    val df = Seq((1, 2), (3, 4), (5, 6)).toDF("A", "B")
    
    df.show()
    /*
    +---+---+
    |  A|  B|
    +---+---+
    |  1|  2|
    |  3|  4|
    |  5|  6|
    +---+---+
    */
    
    df.agg(min("A"), max("A")).show()
    /*
    +------+------+
    |min(A)|max(A)|
    +------+------+
    |     1|     5|
    +------+------+
    */
    

    If you want to get the min and max values as separate variables, then you can convert the result of agg() above into a Row and use Row.getInt(index) to get the column values of the Row.

    val min_max = df.agg(min("A"), max("A")).head()
    // min_max: org.apache.spark.sql.Row = [1,5]
    
    val col_min = min_max.getInt(0)
    // col_min: Int = 1
    
    val col_max = min_max.getInt(1)
    // col_max: Int = 5
    

提交回复
热议问题