I\'m trying to get word counts from a csv when grouping on another column. My csv has three columns: id, message and user_id. I read this in and then split the message and s
Try:
from pyspark.sql.functions import * df.withColumn("word", explode("message")) \ .groupBy("user_id", "word").count() \ .groupBy("user_id") \ .agg(collect_list(struct("word", "count")))