Impala: How to query against multiple parquet files with different schemata

纵饮孤独 提交于 2019-12-06 01:41:07

From what I understand, you have some parquet files and you want to see them through impala tables? Below is my explanation on it.

You can create an external table and set the location to the parquet files directory like below

CREATE EXTERNAL TABLE ingest_parquet_files(col1 string, col2 string) LOCATION "/path/to/my/files/" STORED AS PARQUET;

You have another option of loading the parquet files after creating the table

LOAD DATA INPATH "Your/HDFS/PATH" INTO TABLE schema.ingest_parquet_files;

What you are trying will also work, you have to remove the wildcard character, because it expects a path after the LIKE PARQUET, and looks for the files in the location.

CREATE EXTERNAL TABLE ingest_parquet_files LIKE PARQUET 
'/path/to/my/files/'
STORED AS PARQUET
LOCATION '/tmp';

Below is the template you can refer which is pulled from Cloudera impala doc.

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
  LIKE PARQUET 'hdfs_path_of_parquet_file'
  [COMMENT 'table_comment']
  [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)]
  [WITH SERDEPROPERTIES ('key1'='value1', 'key2'='value2', ...)]
  [
   [ROW FORMAT row_format] [STORED AS file_format]
  ]
  [LOCATION 'hdfs_path']
  [TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)]
  [CACHED IN 'pool_name' [WITH REPLICATION = integer] | UNCACHED]
data_type:
    primitive_type
  | array_type
  | map_type
  | struct_type

Please note that the user you are using should have read-write access to any path you are giving to impala. You can achieve it by performing with the below steps

#Login as hive superuser to perform the below steps
create role <role_name_x>;

#For granting to database
grant all on database to role <role_name_x>;

#For granting to HDFS path
grant all on URI '/hdfs/path' to role <role_name_x>;

#Granting the role to the user you will use to run the impala job
grant role <role_name_x> to group <your_user_name>;

#After you perform the below steps you can validate with the below commands
#grant role should show the URI or database access when you run the grant role check on the role name as below

show grant role <role_name_x>;

#Now to validate if the user has access to the role

show role grant group <your_user_name>;

More on how the roles and permissions are here

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!