How to remove nulls with array_remove Spark SQL Built-in Function

后端 未结 3 1363
攒了一身酷
攒了一身酷 2020-12-16 17:33

Spark 2.4 introduced new useful Spark SQL functions involving arrays but I was a little bit puzzled when I find out that the result of: select array_remove(array(1, 2,

3条回答
  •  轻奢々
    轻奢々 (楼主)
    2020-12-16 17:58

    To answer your first question, Is this an expected behavior? , Yes. Because the official notebook(https://docs.databricks.com/_static/notebooks/apache-spark-2.4-functions.html) points out "Remove all elements that equal to the given element from the given array." and NULL corresponds to undefined values & the results will also not defined.

    So,I think NULL s are out of the purview of this function.

    Better you found out a way to overcome this, you can also use spark.sql("""SELECT array_except(array(1, 2, 3, 3, null, 3, 3,3, 4, 5), array(null))""").show() but the downside is that the result will be without duplicates.

提交回复
热议问题