Efficient reading nested parquet column in Spark
问题 I have following (simplified) schema: root |-- event: struct (nullable = true) | |-- spent: struct (nullable = true) | | |-- amount: decimal(34,3) (nullable = true) | | |-- currency: string (nullable = true) | | | | ... ~ 20 other struct fields on "event" level I'm trying to sum on nested field spark.sql("select sum(event.spent.amount) from event") According to spark metrics I'm reading 18 GB from disk and it takes 2.5 min. However when I select the top level field: spark.sql("select sum