Spark Exception Complex types not supported while loading parquet

假装没事ソ 提交于 2019-12-02 03:46:23

Take 1
SPARK-12854 Vectorize Parquet reader indicates that "ColumnarBatch supports structs and arrays" (cf. GitHub pull request 10820) starting with Spark 2.0.0

And SPARK-13518 Enable vectorized parquet reader by default, also starting with Spark 2.0.0, deals with property spark.sql.parquet.enableVectorizedReader (cf. GitHub commit e809074)

My 2 cents: disable that "VectorizedReader" optimization and see what happens.

Take 2
Since the problem has been narrowed down to some empty files that do not display the same schema as "real" files, my 3 cents: experiment with spark.sql.parquet.mergeSchema to see if the schema from real files takes precedence after merging.

Other than that, you might try to eradicate the empty files at write time, with some kind of re-partitioning e.g. coalesce(1) (OK, 1 is a bit caricatural, but you see the point).

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!