How to get other columns when using Spark DataFrame groupby?

前端 未结 7 1717
甜味超标
甜味超标 2020-11-29 22:04

when I use DataFrame groupby like this:

df.groupBy(df(\"age\")).agg(Map(\"id\"->\"count\"))

I will only get a DataFrame with columns \"a

7条回答
  •  星月不相逢
    2020-11-29 22:09

    May be this solution will helpfull.

    from pyspark.sql import SQLContext
    from pyspark import SparkContext, SparkConf
    from pyspark.sql import functions as F
    from pyspark.sql import Window
    
        name_list = [(101, 'abc', 24), (102, 'cde', 24), (103, 'efg', 22), (104, 'ghi', 21),
                     (105, 'ijk', 20), (106, 'klm', 19), (107, 'mno', 18), (108, 'pqr', 18),
                     (109, 'rst', 26), (110, 'tuv', 27), (111, 'pqr', 18), (112, 'rst', 28), (113, 'tuv', 29)]
    
    age_w = Window.partitionBy("age")
    name_age_df = sqlContext.createDataFrame(name_list, ['id', 'name', 'age'])
    
    name_age_count_df = name_age_df.withColumn("count", F.count("id").over(age_w)).orderBy("count")
    name_age_count_df.show()
    

    Output:

    +---+----+---+-----+
    | id|name|age|count|
    +---+----+---+-----+
    |109| rst| 26|    1|
    |113| tuv| 29|    1|
    |110| tuv| 27|    1|
    |106| klm| 19|    1|
    |103| efg| 22|    1|
    |104| ghi| 21|    1|
    |105| ijk| 20|    1|
    |112| rst| 28|    1|
    |101| abc| 24|    2|
    |102| cde| 24|    2|
    |107| mno| 18|    3|
    |111| pqr| 18|    3|
    |108| pqr| 18|    3|
    +---+----+---+-----+
    

提交回复
热议问题