Custom function over pyspark dataframe
问题 I'm trying to apply a custom function over rows in a pyspark dataframe. This function takes the row and 2 other vectors of the same dimension. It outputs the sum of the values of the third vector for each matching values from the row in the second vector. import pandas as pd import numpy as np Function: def V_sum(row,b,c): return float(np.sum(c[row==b])) What I want to achieve is simple with pandas: pd_df = pd.DataFrame([[0,1,0,0],[1,1,0,0],[0,0,1,0],[1,0,1,1],[1,1,0,0]], columns=['t1', 't2',