Spark Parquet Statistics(min/max) integration

后端未结

关注

 3  1454

情深已故 2021-01-02 13:38

I have been looking into how Spark stores statistics (min/max) in Parquet as well as how it uses the info for query optimization. I have got a few questions. First setup: Sp

3条回答

时光取名叫无心 (楼主)

2021-01-02 14:17

PARQUET-686 made changes to intentionally ignore statistics on binary field when it seems to be appropriate. You can override this behavior by setting parquet.strings.signed-min-max.enabled to true.

After setting that config, you can read min/max in binary field with parquet-tools.

More details in my another stackoverflow question

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...