Just wondering if Parquet predicate pushdown also works on S3, not only HDFS. Specifically if we use Spark (non EMR).
Further explanation might be helpful since it m
Yes. Filter pushdown does not depend on the underlying file system. It only depends on the spark.sql.parquet.filterPushdown and the type of filter (not all filters can be pushed down).
See https://github.com/apache/spark/blob/v2.2.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L313 for the pushdown logic.