I have recently started looking into querying large sets of CSV data lying on HDFS using Hive and Impala. As I was expecting, I get better response time with Impala compared
I can think o the following reasons why Impala is faster, especially on complex SELECT statements.