Handling empty arrays in pySpark (optional binary element (UTF8) is not a group)
问题 I have a json-like structure in spark which looks as follows: >>> df = spark.read.parquet(good_partition_path) id: string some-array: array element: struct array-field-1: string array-field-2: string depending on the partition, some-array might be an empty array for all id 's. When this happend spark infers the following schema: >>> df = spark.read.parquet(bad_partition_path) id: string some-array: array element: string Of course that's a problem if I want to read multiple partitions because