Pyspark: Pass multiple columns in UDF

前端未结

关注

 6  643

有刺的猬 2020-11-30 02:47

I am writing a User Defined Function which will take all the columns except the first one in a dataframe and do sum (or any other operation). Now the dataframe can sometimes

6条回答

慢半拍i (楼主)

2020-11-30 03:51
If you don't want to type out all your column names and would rather just dump all the columns into your UDF, you'll need to wrap a list comprehension within a struct.
```
from pyspark.sql.functions import struct, udf
sum_udf = udf(lambda x: sum(x[1:]))
df_sum = df.withColumn("result", sum_udf(struct([df[col] for col in df.columns])))
```
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...