is Parquet predicate pushdown works on S3 using Spark non EMR?

后端未结

关注

 5  961

-上瘾入骨i 2020-12-05 21:08

Just wondering if Parquet predicate pushdown also works on S3, not only HDFS. Specifically if we use Spark (non EMR).

Further explanation might be helpful since it m

5条回答

执念已碎 (楼主)

2020-12-05 21:23

Here's the keys I'd recommend for s3a work

spark.sql.parquet.filterPushdown true
spark.sql.parquet.mergeSchema false
spark.hadoop.parquet.enable.summary-metadata false

spark.sql.orc.filterPushdown true
spark.sql.orc.splits.include.file.footer true
spark.sql.orc.cache.stripe.details.size 10000

spark.sql.hive.metastorePartitionPruning true

For committing the work. use the S3A "zero rename committer" (hadoop 3.1+) or the EMR equivalent. The original FileOutputCommitters are slow and unsafe

0 讨论(0)

查看其它5个回答