Hive---外部分区表的创建

匿名 (未验证) 提交于 2019-12-03 00:34:01

Hive---外部分区表的创建

(1)假设有个分区表,数据如下:

hive> show create table partition_parquet; OK CREATE TABLE `partition_parquet`(   `member_id` string,   `name` string,   `add_item` string) PARTITIONED BY (   `stat_date` string,   `province` string) ROW FORMAT SERDE   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH SERDEPROPERTIES (   'field.delim'='\t',   'serialization.format'='\t') STORED AS INPUTFORMAT   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION   'hdfs://localhost:9002/user/hive/warehouse/yyz_workdb.db/partition_parquet' TBLPROPERTIES (   'last_modified_by'='a6',   'last_modified_time'='1525229204',   'transient_lastDdlTime'='1525229204') Time taken: 0.173 seconds, Fetched: 22 row(s)

部分数据如下:

hive> SELECT * FROM partition_parquet where stat_date='20110527' and province ='liaoning'; OK 1	liujiannan	NULL	20110527	liaoning 2	wangchaoqun	NULL	20110527	liaoning 3	xuhongxing	NULL	20110527	liaoning 4	zhudaoyong	NULL	20110527	liaoning 5	zhouchengyu	NULL	20110527	liaoning

存储目录如下;

bogon:bin a6$ hadoop dfs -ls -R hdfs://localhost:9002/user/hive/warehouse/yyz_workdb.db/partition_parquet/stat_date=20110527/ DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it.  18/06/23 19:34:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable drwxr-xr-x   - a6 supergroup          0 2017-11-07 10:38 hdfs://localhost:9002/user/hive/warehouse/yyz_workdb.db/partition_parquet/stat_date=20110527/province=liaoning -rwxr-xr-x   1 a6 supergroup        437 2017-11-07 10:38 hdfs://localhost:9002/user/hive/warehouse/yyz_workdb.db/partition_parquet/stat_date=20110527/province=liaoning/000000_0

(2)不好的例子――外部分区表的创建及数据导入

 CREATE external TABLE `partition_external_parquet`(   `member_id` string,   `name` string,   `add_item` string) PARTITIONED BY (   `stat_date` string,   `province` string) ROW FORMAT SERDE   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH SERDEPROPERTIES (   'field.delim'='\t',   'serialization.format'='\t') STORED AS INPUTFORMAT   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION   'hdfs://localhost:9002/user/hive/warehouse/yyz_workdb.db/partition_parquet/stat_date=20110527/province=liaoning'
但是此时表中数据并没有显示,如下:

 hive> SELECT * FROM partition_external_parquet; OK Time taken: 1.695 seconds
原因:没有加入分区
接下来我们加入分区.

 hive> alter table partition_external_parquet add PARTITION(stat_date='20110527',province='liaoning') location 'hdfs://localhost:9002/user/hive/warehouse/yyz_workdb.db/partition_parquet/stat_date=20110527/province=liaoning'; OK Time taken: 0.836 seconds
在此查看数据:
hive> SELECT * FROM partition_external_parquet; OK SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. 1	liujiannan	NULL	20110527	liaoning 2	wangchaoqun	NULL	20110527	liaoning 3	xuhongxing	NULL	20110527	liaoning 4	zhudaoyong	NULL	20110527	liaoning 5	zhouchengyu	NULL	20110527	liaoning Time taken: 1.474 seconds, Fetched: 5 row(s)

(3)良好的例子――外部分区表的创建及数据导入

hive> create external table if not exists partition_external_parquet like partition_parquet; OK Time taken: 0.106 seconds
hive> show create table partition_external_parquet2; OK CREATE EXTERNAL TABLE `partition_external_parquet2`(   `member_id` string,   `name` string,   `add_item` string) PARTITIONED BY (   `stat_date` string,   `province` string) ROW FORMAT SERDE   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH SERDEPROPERTIES (   'field.delim'='\t',   'serialization.format'='\t') STORED AS INPUTFORMAT   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION   'hdfs://localhost:9002/user/hive/warehouse/yyz_workdb.db/partition_external_parquet2' TBLPROPERTIES (   'transient_lastDdlTime'='1529753081') Time taken: 0.057 seconds, Fetched: 20 row(s)

以静态全静态分区的形式导入数据

hive>  alter table partition_external_parquet2 add PARTITION(stat_date='20110527',province='liaoning') location 'hdfs://localhost:9002/user/hive/warehouse/yyz_workdb.db/partition_parquet/stat_date=20110527/province=liaoning'; OK Time taken: 0.078 seconds
hive> select * from partition_external_parquet2; OK 1	liujiannan	NULL	20110527	liaoning 2	wangchaoqun	NULL	20110527	liaoning 3	xuhongxing	NULL	20110527	liaoning 4	zhudaoyong	NULL	20110527	liaoning 5	zhouchengyu	NULL	20110527	liaoning Time taken: 0.133 seconds, Fetched: 5 row(s)
参考:https://blog.csdn.net/a2011480169/article/details/51991421

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!