Spark 2.4 introduced new useful Spark SQL functions involving arrays but I was a little bit puzzled when I find out that the result of:
select array_remove(array(1, 2,
You can do something like this in Spark 2:
import org.apache.spark.sql.functions._
import org.apache.spark.sql._
/**
* Array without nulls
* For complex types, you are responsible for passing in a nullPlaceholder of the same type as elements in the array
*/
def non_null_array(columns: Seq[Column], nullPlaceholder: Any = "רכוב כל יום"): Column =
array_remove(array(columns.map(c => coalesce(c, lit(nullPlaceholder))): _*), nullPlaceholder)
In Spark 3, there is new array filter function and you can do:
df.select(filter(col("array_column"), x => x.isNotNull))