问题 I have a dataframe where i need to first apply dataframe and then get weighted average as shown in the output calculation below. What is an efficient way in pyspark to do that? data = sc.parallelize([ [111,3,0.4], [111,4,0.3], [222,2,0.2], [222,3,0.2], [222,4,0.5]] ).toDF(['id', 'val','weight']) data.show() +---+---+------+ | id|val|weight| +---+---+------+ |111| 3| 0.4| |111| 4| 0.3| |222| 2| 0.2| |222| 3| 0.2| |222| 4| 0.5| +---+---+------+ Output: id weigthed_val 111 (3*0.4 + 4*0.3)/(0.4 +