问题
If I build Hive on top of HDFS, do I need to put all the files into hive/warehouse folder before processing them? Can I query any file which is in hdfs by hive? How?
回答1:
You don't have to do anything special in order to run Hive on top of your existing HDFS cluster. This happens by virtue of Hive's architecture. Hive by default runs on HDFS.
do I need to put all the files into hive/warehouse folder before processing them?
You don't have to do this either.
When you create a Hive table and load data from a file into it using LOAD command, the base file automatically gets moved into the Hive warehouse. You don't have to do anything explicitly. But this comes with a cost. If you drop such a table your file will be deleted. These types of files are called as Managed Tables in Hive terminology.
In order to overcome this issue you can make use of another type of tables supported by Hive, External Tables. When you create an External Table and load data into it, the base file doesn't get moved into the warehouse. Just the metadata associated with that table gets added into the Hive metastore. And when you delete this table, only the metadata gets removed form the metastore without removing the base file. You just have to specify the location of the base file through the LOCATION clause while creating an external table.
Can I query any file which is in hdfs by hive? How?
Yes. Create an external table which will refer to this file with the help of LOCATION clause. You can then query the data inside this file like any other Hive table.
Hope this answers your query.
回答2:
When you create a table in Hive, by default Hive will manage the data, which means that Hive moves the data into its warehouse directory. Alternatively, you may create an external table
, which tells Hive to refer to the data that is at an existing location outside the warehouse directory.
CREATE EXTERNAL TABLE external_table (dummy STRING)
LOCATION '/user/external_table';
LOAD DATA INPATH '/user/data.txt' INTO TABLE external_table;
来源:https://stackoverflow.com/questions/18879557/hadoop-hive-query-files-from-hdfs