Convert null values to empty array in Spark DataFrame

后端 未结 3 2098

I have a Spark data frame where one column is an array of integers. The column is nullable because it is coming from a left outer join. I want to convert all null values to

3条回答
  •  感情败类
    2020-12-01 11:20

    An UDF-free alternative to use when the data type you want your array elements in can not be cast from StringType is the following:

    import pyspark.sql.types as T
    import pyspark.sql.functions as F
    
    df.withColumn(
        "myCol",
        F.coalesce(
            F.col("myCol"),
            F.from_json(F.lit("[]"), T.ArrayType(T.IntegerType()))
        )
    )
    

    You can replace IntegerType() with whichever data type, also complex ones.

提交回复
热议问题