Let\'s say I have a DataFrame with a column for users and another column for words they\'ve written:
DataFrame
from pyspark.sql import functions as F df.groupby("user").agg(F.collect_list("word"))