I\'m trying to figure out the best way to get the largest value in a Spark dataframe column.
Consider the following example:
df = spark.createDataFra
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
val testDataFrame = Seq(
(1.0, 4.0), (2.0, 5.0), (3.0, 6.0)
).toDF("A", "B")
val (maxA, maxB) = testDataFrame.select(max("A"), max("B"))
.as[(Double, Double)]
.first()
println(maxA, maxB)
And the result is (3.0,6.0), which is the same to the testDataFrame.agg(max($"A"), max($"B")).collect()(0)
.However, testDataFrame.agg(max($"A"), max($"B")).collect()(0)
returns a List, [3.0,6.0]