External table does not return the data in its folder

别来无恙 提交于 2019-12-01 21:22:31

You have created your table as partitioned table base on column datehour, but you are putting your data in /user/cloudera/data. Hive will look for data in /user/cloudera/data/datehour=(some int value). Since it is an external table hive will not update the metastore. You need to run some alter statement to update that

So here are the steps for external tables with partition:

1.) In you external location /user/cloudera/data, create a directory datehour=0909201401

                                OR

Load data using: LOAD DATA [LOCAL] INPATH '/path/to/data/file' INTO TABLE partition(datehour=0909201401)

2.) After creating your table run a alter statement: ALTER TABLE ADD PARTITION (datehour=0909201401)

Hope it helps...!!!

When we create an EXTERNAL TABLE with PARTITION, we have to ALTER the EXTERNAL TABLE with the data location for that given partition. However, it need not be the same path as we specify while creating the EXTERNAL TABLE.

hive> ALTER TABLE tb ADD PARTITION (datehour=0909201401)
hive> LOCATION '/user/cloudera/data/somedatafor_datehour'
hive> ;

When we specify LOCATION '/user/cloudera/data' (though its optional) while creating an EXTERNAL TABLE we can take some advantage of doing repair operations on that table. So when we want to copy the files through some process like ETL into that directory, we can sync up the partition with the EXTERNAL TABLE instead of writing ALTER TABLE statement to create another new partition.

If we already know the directory structure of the partition that HIVE would create, we can simply place the data file in that location like '/user/cloudera/data/datehour=0909201401/data.txt' and run the statement as shown below:

hive> MSCK REPAIR TABLE tb;  

The above statement will sync up the partition to the hive meta store of the table "tb".

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!