I have a Spark data frame where one column is an array of integers. The column is nullable because it is coming from a left outer join. I want to convert all null values to
An UDF-free alternative to use when the data type you want your array elements in can not be cast from StringType is the following:
import pyspark.sql.types as T
import pyspark.sql.functions as F
df.withColumn(
"myCol",
F.coalesce(
F.col("myCol"),
F.from_json(F.lit("[]"), T.ArrayType(T.IntegerType()))
)
)
You can replace IntegerType() with whichever data type, also complex ones.