when I use DataFrame groupby like this:
df.groupBy(df(\"age\")).agg(Map(\"id\"->\"count\"))
I will only get a DataFrame with columns \"a
Here an example that I came across in spark-workshop
val populationDF = spark.read
.option("infer-schema", "true")
.option("header", "true")
.format("csv").load("file:///databricks/driver/population.csv")
.select('name, regexp_replace(col("population"), "\\s", "").cast("integer").as("population"))
val maxPopulationDF = populationDF.agg(max('population).as("populationmax"))
To get other columns, I do a simple join between the original DF and the aggregated one
populationDF.join(maxPopulationDF,populationDF.col("population") === maxPopulationDF.col("populationmax")).select('name, 'populationmax).show()