Hive create table with inputs from nested sub-directories

本秂侑毒 提交于 2019-12-01 03:16:37

Use this Hive settings to enable recursive directories:

set hive.mapred.supports.subdirectories=TRUE;
set mapred.input.dir.recursive=TRUE;

Create external table and specify root directory as a location:

LOCATION 'hdfs://.../data'

You will be able to query data from table location and all subdirectories

Dhruv Kapur

One thing that would solve your problem is adding the folder name as a partition column to the external table. Then you can create the table as you're creating just on the data directory. Or you can take these nested files and flatten them in a single directory.

I don't think you'll be able to ask hive to have input of all these folders considered as 1 table otherwise.

This questions seems to be addressing a similar issue: when creating an external table in hive can I point the location to specific files in a direcotry?

There is an open jira issue on the same context: https://issues.apache.org/jira/browse/HIVE-951

Browsing more I saw this post suggesting you use SimlinkInputTextFormat as an alternative. I am not sure how well this would fly with your Avro format. https://hive.apache.org/javadocs/r0.10.0/api/org/apache/hadoop/hive/ql/io/SymlinkTextInputFormat.html

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!