Spark 2.4 introduced new useful Spark SQL functions involving arrays but I was a little bit puzzled when I find out that the result of:
select array_remove(array(1, 2,
To answer your first question, Is this an expected behavior? , Yes. Because the official notebook(https://docs.databricks.com/_static/notebooks/apache-spark-2.4-functions.html) points out "Remove all elements that equal to the given element from the given array." and
NULL
corresponds to undefined values & the results will also not defined.
So,I think NULL
s are out of the purview of this function.
Better you found out a way to overcome this, you can also use spark.sql("""SELECT array_except(array(1, 2, 3, 3, null, 3, 3,3, 4, 5), array(null))""").show()
but the downside is that the result will be without duplicates.