Apply a function to groupBy data with pyspark

前端 未结 2 1836
故里飘歌
故里飘歌 2021-01-04 19:10

I\'m trying to get word counts from a csv when grouping on another column. My csv has three columns: id, message and user_id. I read this in and then split the message and s

2条回答
  •  半阙折子戏
    2021-01-04 19:38

    Try:

    from  pyspark.sql.functions import *
    
    df.withColumn("word", explode("message")) \
      .groupBy("user_id", "word").count() \
      .groupBy("user_id") \
      .agg(collect_list(struct("word", "count")))
    

提交回复
热议问题