How can I concatenate the rows in a pyspark dataframe with multiple columns using groupby and aggregate
问题 I have a pyspark dataframe with multiple columns. For example the one below. from pyspark.sql import Row l = [('Jack',"a","p"),('Jack',"b","q"),('Bell',"c","r"),('Bell',"d","s")] rdd = sc.parallelize(l) score_rdd = rdd.map(lambda x: Row(name=x[0], letters1=x[1], letters2=x[2])) score_card = sqlContext.createDataFrame(score_rdd) +----+--------+--------+ |name|letters1|letters2| +----+--------+--------+ |Jack| a| p| |Jack| b| q| |Bell| c| r| |Bell| d| s| +----+--------+--------+ Now I want to