How to get other columns when using Spark DataFrame groupby?

前端 未结 7 1718
甜味超标
甜味超标 2020-11-29 22:04

when I use DataFrame groupby like this:

df.groupBy(df(\"age\")).agg(Map(\"id\"->\"count\"))

I will only get a DataFrame with columns \"a

7条回答
  •  南笙
    南笙 (楼主)
    2020-11-29 22:10

    Here an example that I came across in spark-workshop

    val populationDF = spark.read
                    .option("infer-schema", "true")
                    .option("header", "true")
                    .format("csv").load("file:///databricks/driver/population.csv")
                    .select('name, regexp_replace(col("population"), "\\s", "").cast("integer").as("population"))
    

    val maxPopulationDF = populationDF.agg(max('population).as("populationmax"))

    To get other columns, I do a simple join between the original DF and the aggregated one

    populationDF.join(maxPopulationDF,populationDF.col("population") === maxPopulationDF.col("populationmax")).select('name, 'populationmax).show()
    

提交回复
热议问题