Spark Parquet Statistics(min/max) integration

后端 未结 3 1458
情深已故
情深已故 2021-01-02 13:38

I have been looking into how Spark stores statistics (min/max) in Parquet as well as how it uses the info for query optimization. I have got a few questions. First setup: Sp

3条回答
  •  日久生厌
    2021-01-02 14:06

    For the first question, I believe this is a matter of definition (what would be the min/max of a string? lexical ordering?) but in any case as far as I know, spark's parquet currently only indexes numbers.

    As for the second question, I believe that if you look deeper you would see that spark is not loading the files themselves. Instead it is reading the metadata so it knows whether to read a block or not. So basically it is pushing the predicate to the file (block) level.

提交回复
热议问题