parquet-mr

PySpark Write Parquet Binary Column with Stats (signed-min-max.enabled)

房东的猫 提交于 2020-05-13 14:14:33
问题 I found this apache-parquet ticket https://issues.apache.org/jira/browse/PARQUET-686 which is marked as resolved for parquet-mr 1.8.2. The feature I want is the calculated min/max in the parquet metadata for a ( string or BINARY ) column. And referencing this is an email https://lists.apache.org/thread.html/%3CCANPCBc2UPm+oZFfP9oT8gPKh_v0_BF0jVEuf=Q3d-5=ugxSFbQ@mail.gmail.com%3E which uses scala instead of pyspark as an example: Configuration conf = new Configuration(); + conf.set("parquet