How does impala provide faster query response compared to hive
I have recently started looking into querying large sets of CSV data lying on HDFS using Hive and Impala. As I was expecting, I get better response time with Impala compared to Hive for the queries I have used so far. I am wondering if there are some types of queries/use cases that still need Hive and where Impala is not a good fit. How does Impala provide faster query response compared to Hive for the same data on HDFS? You should see Impala as "SQL on HDFS", while Hive is more "SQL on Hadoop". In other words, Impala doesn't even use Hadoop at all. It simply has daemons running on all your