is Parquet predicate pushdown works on S3 using Spark non EMR?

后端 未结 5 961
-上瘾入骨i
-上瘾入骨i 2020-12-05 21:08

Just wondering if Parquet predicate pushdown also works on S3, not only HDFS. Specifically if we use Spark (non EMR).

Further explanation might be helpful since it m

5条回答
  •  执念已碎
    2020-12-05 21:23

    Here's the keys I'd recommend for s3a work

    spark.sql.parquet.filterPushdown true
    spark.sql.parquet.mergeSchema false
    spark.hadoop.parquet.enable.summary-metadata false
    
    spark.sql.orc.filterPushdown true
    spark.sql.orc.splits.include.file.footer true
    spark.sql.orc.cache.stripe.details.size 10000
    
    spark.sql.hive.metastorePartitionPruning true
    

    For committing the work. use the S3A "zero rename committer" (hadoop 3.1+) or the EMR equivalent. The original FileOutputCommitters are slow and unsafe

提交回复
热议问题