Apache Spark Parquet: Cannot build an empty group

匿名 (未验证) 提交于 2019-12-03 08:59:04

问题:

I use Apache Spark 2.1.1 (used 2.1.0 and it was the same, switched today). I have a dataset:

root |-- muons: array (nullable = true) |    |-- element: struct (containsNull = true) |    |    |-- reco::Candidate: struct (nullable = true) |    |    |-- qx3_: integer (nullable = true) |    |    |-- pt_: float (nullable = true) |    |    |-- eta_: float (nullable = true) |    |    |-- phi_: float (nullable = true) |    |    |-- mass_: float (nullable = true) |    |    |-- vertex_: struct (nullable = true) |    |    |    |-- fCoordinates: struct (nullable = true) |    |    |    |    |-- fX: float (nullable = true) |    |    |    |    |-- fY: float (nullable = true) |    |    |    |    |-- fZ: float (nullable = true) |    |    |-- pdgId_: integer (nullable = true) |    |    |-- status_: integer (nullable = true) |    |    |-- cachePolarFixed_: struct (nullable = true) |    |    |-- cacheCartesianFixed_: struct (nullable = true) 

As you can see, there are 3 empty structs in this schema. I know 100% that I can read/manipulate/do whatever. However, when I try writing to disk in parquet, I get the following Exception:

dsReduced.write.format("parquet").save(outputPathName):  java.lang.IllegalStateException: Cannot build an empty group at org.apache.parquet.Preconditions.checkState(Preconditions.java:91) at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:622) at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:497) at org.apache.parquet.schema.Types$Builder.named(Types.java:286) at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:535) at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321) at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:534) at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:533) 

So, basically I would like to understand if it's a bug or an intended behavior??? I also assume that it's related to the empty structs. Any help would be really appreciated!

Update: I've quickly created stripped version and that one works without any issues! Any insight would be really helpful!

VK

回答1:

Parquet does not write empty structs:

for more info - see here https://issues.apache.org/jira/browse/SPARK-20593

VK



易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!