Hadoop Hive query files from hdfs

拈花ヽ惹草 提交于 2020-01-17 08:10:39

问题


If I build Hive on top of HDFS, do I need to put all the files into hive/warehouse folder before processing them? Can I query any file which is in hdfs by hive? How?


回答1:


You don't have to do anything special in order to run Hive on top of your existing HDFS cluster. This happens by virtue of Hive's architecture. Hive by default runs on HDFS.

do I need to put all the files into hive/warehouse folder before processing them?

You don't have to do this either.

When you create a Hive table and load data from a file into it using LOAD command, the base file automatically gets moved into the Hive warehouse. You don't have to do anything explicitly. But this comes with a cost. If you drop such a table your file will be deleted. These types of files are called as Managed Tables in Hive terminology.

In order to overcome this issue you can make use of another type of tables supported by Hive, External Tables. When you create an External Table and load data into it, the base file doesn't get moved into the warehouse. Just the metadata associated with that table gets added into the Hive metastore. And when you delete this table, only the metadata gets removed form the metastore without removing the base file. You just have to specify the location of the base file through the LOCATION clause while creating an external table.

Can I query any file which is in hdfs by hive? How?

Yes. Create an external table which will refer to this file with the help of LOCATION clause. You can then query the data inside this file like any other Hive table.

Hope this answers your query.




回答2:


When you create a table in Hive, by default Hive will manage the data, which means that Hive moves the data into its warehouse directory. Alternatively, you may create an external table, which tells Hive to refer to the data that is at an existing location outside the warehouse directory.

CREATE EXTERNAL TABLE external_table (dummy STRING)
LOCATION '/user/external_table';
LOAD DATA INPATH '/user/data.txt' INTO TABLE external_table;


来源:https://stackoverflow.com/questions/18879557/hadoop-hive-query-files-from-hdfs

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!