Parquet-backed Hive table: array column not queryable in Impala
问题 Although Impala is much faster than Hive, we used Hive because it supports complex (nested) data types such as arrays and maps. I notice that Impala, as of CDH5.5, now supports complex data types. Since it's also possible to run Hive UDF's in Impala, we can probably do everything we want in Impala, but much, much faster. That's great news! As I scan through the documentation, I see that Impala expects data to be stored in Parquet format. My data, in its raw form, happens to be a two-column