is Parquet predicate pushdown works on S3 using Spark non EMR?

后端 未结 5 975
-上瘾入骨i
-上瘾入骨i 2020-12-05 21:08

Just wondering if Parquet predicate pushdown also works on S3, not only HDFS. Specifically if we use Spark (non EMR).

Further explanation might be helpful since it m

5条回答
  •  北荒
    北荒 (楼主)
    2020-12-05 21:35

    Recently I tried this with Spark 2.4 and seems like Pushdown predicate works with s3.

    This is the spark sql query:

    explain select * from default.my_table where month = '2009-04' and site = 'http://jdnews.com/sports/game_1997_jdnsports__article.html/play_rain.html' limit 100;
    

    And here is the part of output:

    PartitionFilters: [isnotnull(month#6), (month#6 = 2009-04)], PushedFilters: [IsNotNull(site), EqualTo(site,http://jdnews.com/sports/game_1997_jdnsports__article.html/play_ra...
    

    Which clearly stats that PushedFilters is not empty.

    Note: The used table was created on top of AWS S3

提交回复
热议问题