Apache Spark Dataframe Groupby agg() for multiple columns

匿名 (未验证) 提交于 2019-12-03 03:10:03

问题:

I have a DataFrame with 3 columns i.e. Id, First Name, Last Name

I want to apply GroupBy on the basis of Id and want to collect First Name, Last Name column as list.

Example :- I have a DF like this

+---+-------+--------+ |id |fName  |lName   | +---+-------+--------+ |1  |Akash  |Sethi   | |2  |Kunal  |Kapoor  | |3  |Rishabh|Verma   | |2  |Sonu   |Mehrotra| +---+-------+--------+ 

and I want my output like this

+---+-------+--------+--------------------+ |id |fname           |lName               | +---+-------+--------+--------------------+ |1  |[Akash]         |[Sethi]             | |2  |[Kunal, Sonu]   |[Kapoor, Mehrotra]  | |3  |[Rishabh]       |[Verma]             | +---+-------+--------+--------------------+ 

Thanks in Advance

回答1:

You can aggregate multiple columns like this:

df.groupBy("id").agg(collect_list("fName"), collect_list("lName")) 

It will give you the expected result.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!