How to deal with Spark UDF input/output of primitive nullable type

混江龙づ霸主 提交于 2019-11-30 18:54:43

问题


The issues:

1) Spark doesn't call UDF if input is column of primitive type that contains null:

inputDF.show()

+-----+
|  x  |
+-----+
| null|
|  1.0|
+-----+

inputDF
  .withColumn("y",
     udf { (x: Double) => 2.0 }.apply($"x") // will not be invoked if $"x" == null
  )
  .show()

+-----+-----+
|  x  |  y  |
+-----+-----+
| null| null|
|  1.0|  2.0|
+-----+-----+

2) Can't produce null from UDF as a column of primitive type:

udf { (x: String) => null: Double } // compile error


回答1:


Accordingly to the docs:

Note that if you use primitive parameters, you are not able to check if it is null or not, and the UDF will return null for you if the primitive input is null. Use boxed type or [[Option]] if you wanna do the null-handling yourself.


So, the easiest solution is just to use boxed types if your UDF input is nullable column of primitive type OR/AND you need to output null from UDF as a column of primitive type:

inputDF
  .withColumn("y",
     udf { (x: java.lang.Double) => 
       (if (x == null) 1 else null): java.lang.Integer
     }.apply($"x")
  )
  .show()

+-----+-----+
|  x  |  y  |
+-----+-----+
| null| null|
|  1.0|  2.0|
+-----+-----+



回答2:


I would also use Artur's solution, but there is also another way without using javas wrapper classes by using struct:

import org.apache.spark.sql.functions.struct
import org.apache.spark.sql.Row

inputDF
  .withColumn("y",
     udf { (r: Row) => 
       if (r.isNullAt(0)) Some(1) else None
     }.apply(struct($"x"))
  )
  .show()



回答3:


Based on the solution provided at SparkSQL: How to deal with null values in user defined function? by @zero323, an alternative way to achieve the requested result is:

import scala.util.Try
val udfHandlingNulls udf((x: Double) => Try(2.0).toOption)
inputDF.withColumn("y", udfHandlingNulls($"x")).show()


来源:https://stackoverflow.com/questions/42791912/how-to-deal-with-spark-udf-input-output-of-primitive-nullable-type

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!