Impala: How to query against multiple parquet files with different schemata
in Spark 2.1 I often use something like df = spark.read.parquet(/path/to/my/files/*.parquet) to load a folder of parquet files even with different schemata. Then I perform some SQL queries against the dataframe using SparkSQL. Now I want to try Impala because I read the wiki article , which containing sentences like: Apache Impala is an open source massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop [...]. Reads Hadoop file formats, including text, LZO, SequenceFile, Avro, RCFile, and Parquet. So it sounds like it could also fit to