How to pass a constant value to Python UDF?

后端 未结 1 1286
执笔经年
执笔经年 2020-12-06 18:35

I was thinking if it was possible to create an UDF that receives two arguments a Column and another variable (Object,Dictionary<

相关标签:
1条回答
  • Everything that is passed to an UDF is interpreted as a column / column name. If you want to pass a literal you have two options:

    1. Pass argument using currying:

      def comparatorUDF(n):
          return udf(lambda c: c == n, BooleanType())
      
      df.where(comparatorUDF("Bonsanto")(col("name")))
      

      This can be used with an argument of any type as long as it is serializable.

    2. Use a SQL literal and the current implementation:

      from pyspark.sql.functions import lit
      
      df.where(comparatorUDF(col("name"), lit("Bonsanto")))
      

      This works only with supported types (strings, numerics, booleans). For non-atomic types see How to add a constant column in a Spark DataFrame?

    0 讨论(0)
提交回复
热议问题