How to use groupBy to collect rows into a map?

前端 未结 3 792
轮回少年
轮回少年 2020-12-08 17:46

Context

sqlContext.sql(s\"\"\"
SELECT
school_name,
name,
age
FROM my_table
\"\"\")

Ask

Given the

3条回答
  •  长情又很酷
    2020-12-08 18:10

    As of spark 2.4 you can use map_from_arrays function to achieve this.

    val df = spark.sql(s"""
        SELECT *
        FROM VALUES ('s1','a',1),('s1','b',2),('s2','a',1)
        AS (school, name, age)
    """)
    
    val df2 = df.groupBy("school").agg(map_from_arrays(collect_list($"name"), collect_list($"age")).as("map"))
    
    
    
    +------+----+---+
    |school|name|age|
    +------+----+---+
    |    s1|   a|  1|
    |    s1|   b|  2|
    |    s2|   a|  1|
    +------+----+---+
    
    +------+----------------+
    |school|             map|
    +------+----------------+
    |    s2|        [a -> 1]|
    |    s1|[a -> 1, b -> 2]|
    +------+----------------+
    

提交回复
热议问题