How to convert empty arrays to nulls?

后端 未结 7 518
Happy的楠姐
Happy的楠姐 2021-01-13 18:28

I have below dataframe and i need to convert empty arrays to null.

+----+---------+-----------+
|  id|count(AS)|count(asdr)|
+----+---------+-----------+
|11         


        
7条回答
  •  庸人自扰
    2021-01-13 18:57

    For your given dataframe, you can simply do the following

    from pyspark.sql import functions as F
    df.withColumn("count(AS)", F.when((F.size(F.col("count(AS)")) == 0), F.lit(None)).otherwise(F.col("count(AS)"))) \
        .withColumn("count(asdr)", F.when((F.size(F.col("count(asdr)")) == 0), F.lit(None)).otherwise(F.col("count(asdr)"))).show()
    

    You should have output dataframe as

    +----+---------+-----------+
    |  id|count(AS)|count(asdr)|
    +----+---------+-----------+
    |1110| [12, 45]|   [50, 55]|
    |1111|     null|       null|
    |1112| [45, 46]|   [50, 50]|
    |1113|     null|       null|
    +----+---------+-----------+
    

    Updated

    In case you have more than two array columns and you want to apply the above logic dynamically, you can use the following logic

    from pyspark.sql import functions as F
    for c in df.dtypes:
        if "array" in c[1]:
            df = df.withColumn(c[0], F.when((F.size(F.col(c[0])) == 0), F.lit(None)).otherwise(F.col(c[0])))
    df.show()
    

    Here,
    df.dtypes would give you array of tuples with column name and datatype. As for the dataframe in the question it would be

    [('id', 'bigint'), ('count(AS)', 'array'), ('count(asdr)', 'array')]
    

    withColumn is applied to only array columns ("array" in c[1]) where F.size(F.col(c[0])) == 0 is the condition checking for when function which checks for the size of the array. if the condition is true i.e. empty array then None is populated else original value is populated. The loop is applied to all the array columns.

提交回复
热议问题