I would like to access to the min and max of a specific column from my dataframe but I don\'t have the header of the column, just its number, so I should I do using scala ?
Here is a direct way to get the min and max from a dataframe with column names:
val df = Seq((1, 2), (3, 4), (5, 6)).toDF("A", "B")
df.show()
/*
+---+---+
| A| B|
+---+---+
| 1| 2|
| 3| 4|
| 5| 6|
+---+---+
*/
df.agg(min("A"), max("A")).show()
/*
+------+------+
|min(A)|max(A)|
+------+------+
| 1| 5|
+------+------+
*/
If you want to get the min and max values as separate variables, then you can convert the result of agg()
above into a Row
and use Row.getInt(index)
to get the column values of the Row
.
val min_max = df.agg(min("A"), max("A")).head()
// min_max: org.apache.spark.sql.Row = [1,5]
val col_min = min_max.getInt(0)
// col_min: Int = 1
val col_max = min_max.getInt(1)
// col_max: Int = 5