I have a DataFrame whose data I am pasting below:
+---------------+--------------+----------+------------+----------+
|name | DateTime|
Solution:
import pyspark.sql.functions as f
w = Window.partitionBy("Seq").orderBy("DateTime")
df.select( "*", f.concat_ws( "", f.collect_set(f.col("name")).over(w) ).alias("cummuliative_name") ).show()
Explanation
collect_set() - This function returns value like [["abc","xyz","rafa",{},"experience"]] .
concat_ws() - This function takes the output of collect_set() as input and converts it into abc, xyz, rafa, {}, experience
Note: Use collect_set() if you don't have duplicates or else use collect_list()