How to add strings of one columns of the dataframe and form another column that will have the incremental value of the original column

前端 未结 2 982
暗喜
暗喜 2021-01-16 06:43

I have a DataFrame whose data I am pasting below:

+---------------+--------------+----------+------------+----------+
|name           |      DateTime|                


        
2条回答
  •  不要未来只要你来
    2021-01-16 07:40

    Solution:

    import pyspark.sql.functions as f

    w = Window.partitionBy("Seq").orderBy("DateTime")

    df.select( "*", f.concat_ws( "", f.collect_set(f.col("name")).over(w) ).alias("cummuliative_name") ).show()

    Explanation

    collect_set() - This function returns value like [["abc","xyz","rafa",{},"experience"]] .

    concat_ws() - This function takes the output of collect_set() as input and converts it into abc, xyz, rafa, {}, experience

    Note: Use collect_set() if you don't have duplicates or else use collect_list()

提交回复
热议问题