I have below dataframe and i need to convert empty arrays to null.
+----+---------+-----------+
| id|count(AS)|count(asdr)|
+----+---------+-----------+
|11
There is no easy solution like df.na.fill here. One way would be to loop over all relevant columns and replace values where appropriate. Example using foldLeft in scala:
val columns = df.schema.filter(_.dataType.typeName == "array").map(_.name)
val df2 = columns.foldLeft(df)((acc, colname) => acc.withColumn(colname,
when(size(col(colname)) === 0, null).otherwise(col(colname))))
First, all columns of array type is extracted and then these are iterated through. Since the size function is only defined for columns of array type this is a necessary step (and avoids looping over all columns).
Using the dataframe:
+----+--------+-----+
| id| col1| col2|
+----+--------+-----+
|1110|[12, 11]| []|
|1111| []| [11]|
|1112| [123]|[321]|
+----+--------+-----+
The result is as follows:
+----+--------+-----+
| id| col1| col2|
+----+--------+-----+
|1110|[12, 11]| null|
|1111| null| [11]|
|1112| [123]|[321]|
+----+--------+-----+