How to convert empty arrays to nulls?

后端 未结 7 533
Happy的楠姐
Happy的楠姐 2021-01-13 18:28

I have below dataframe and i need to convert empty arrays to null.

+----+---------+-----------+
|  id|count(AS)|count(asdr)|
+----+---------+-----------+
|11         


        
7条回答
  •  盖世英雄少女心
    2021-01-13 18:59

    There is no easy solution like df.na.fill here. One way would be to loop over all relevant columns and replace values where appropriate. Example using foldLeft in scala:

    val columns = df.schema.filter(_.dataType.typeName == "array").map(_.name)
    
    val df2 = columns.foldLeft(df)((acc, colname) => acc.withColumn(colname, 
        when(size(col(colname)) === 0, null).otherwise(col(colname))))
    

    First, all columns of array type is extracted and then these are iterated through. Since the size function is only defined for columns of array type this is a necessary step (and avoids looping over all columns).

    Using the dataframe:

    +----+--------+-----+
    |  id|    col1| col2|
    +----+--------+-----+
    |1110|[12, 11]|   []|
    |1111|      []| [11]|
    |1112|   [123]|[321]|
    +----+--------+-----+
    

    The result is as follows:

    +----+--------+-----+
    |  id|    col1| col2|
    +----+--------+-----+
    |1110|[12, 11]| null|
    |1111|    null| [11]|
    |1112|   [123]|[321]|
    +----+--------+-----+
    

提交回复
热议问题