Retrieve top n in each group of a DataFrame in pyspark
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: There's a DataFrame in pyspark with data as below: user_id object_id score user_1 object_1 3 user_1 object_1 1 user_1 object_2 2 user_2 object_1 5 user_2 object_2 2 user_2 object_2 6 What I expect is returning 2 records in each group with the same user_id, which need to have the highest score. Consequently, the result should look as the following: user_id object_id score user_1 object_1 3 user_1 object_2 2 user_2 object_2 6 user_2 object_1 5 I'm really new to pyspark, could anyone give me a code snippet or portal to the related documentation