Pyspark: Pass multiple columns in UDF

前端 未结 6 647
有刺的猬
有刺的猬 2020-11-30 02:47

I am writing a User Defined Function which will take all the columns except the first one in a dataframe and do sum (or any other operation). Now the dataframe can sometimes

6条回答
  •  北荒
    北荒 (楼主)
    2020-11-30 03:50

    Another simple way without Array and Struct.

    from pyspark.sql.types import IntegerType
    from pyspark.sql.functions import udf, struct
    
    def sum(x, y):
        return x + y
    
    sum_cols = udf(sum, IntegerType())
    
    a=spark.createDataFrame([(101, 1, 16)], ['ID', 'A', 'B'])
    a.show()
    a.withColumn('Result', sum_cols('A', 'B')).show()
    

提交回复
热议问题