hiveql

java.lang.RuntimeException:Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

阅读更多关于 java.lang.RuntimeException:Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

问题 I have configured my Hive as given on link: http://www.youtube.com/watch?v=Dqo1ahdBK_A, but I am getting the following error while creating a table in Hive. I am using hadoop-1.2.1 and hive-0.12.0. hive> create table employee(emp_id int,name string,salary double); FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient 回答1: Looks like problem with your metastore.

How do I output the results of a HiveQL query to CSV?

阅读更多关于 How do I output the results of a HiveQL query to CSV?

问题 we would like to put the results of a Hive query to a CSV file. I thought the command should look like this: insert overwrite directory \'/home/output.csv\' select books from table; When I run it, it says it completeld successfully but I can never find the file. How do I find this file or should I be extracting the data in a different way? Thanks! 回答1: Although it is possible to use INSERT OVERWRITE to get data out of Hive, it might not be the best method for your particular case. First let

Execute Hive Query with IN clause parameters in parallel

阅读更多关于 Execute Hive Query with IN clause parameters in parallel

问题 I am having a Hive query like the one below: select a.x as column from table1 a where a.y in (<long comma-separated list of parameters>) union all select b.x as column from table2 b where b.y in (<long comma-separated list of parameters>) I have set hive.exec.parallel as true which is helping me achieve parallelism between the two queries between union all. But, my IN clause has many comma separated values and each value is taken once in 1 job and then the next value. This is actually getting

HIVE select count() non null returns higher value than select count()

阅读更多关于 HIVE select count(*) non null returns higher value than select count(*)

问题 I am currently doing some data exploration with Hive and cannot explain the following behavior. Say I have a table (named mytable) with a field master_id. When I count the number of row I get select count(*) as c from mytable c 1129563 If I want to count the number of row with a non null master_id, I get a higher number select count(*) as c from mytable where master_id is not null c 1134041 Additionally, the master_id seems to be never null. select count(*) as c from mytable where master_id

Difference between Hive internal tables and external tables?

阅读更多关于 Difference between Hive internal tables and external tables?

问题 Can anyone tell me the difference between Hive\'s external table and internal tables. I know the difference comes when dropping the table. I don\'t understand what you mean by the data and metadata is deleted in internal and only metadata is deleted in external tables. Can anyone explain me in terms of nodes please. 回答1: Hive has a relational database on the master node it uses to keep track of state. For instance, when you CREATE TABLE FOO(foo string) LOCATION 'hdfs://tmp/'; , this table

How to set variables in HIVE scripts

阅读更多关于 How to set variables in HIVE scripts

问题 I\'m looking for the SQL equivalent of SET varname = value in Hive QL I know I can do something like this: SET CURRENT_DATE = \'2012-09-16\'; SELECT * FROM foo WHERE day >= @CURRENT_DATE But then I get this error: character \'@\' not supported here 回答1: You need to use the special hiveconf for variable substitution. e.g. hive> set CURRENT_DATE='2012-09-16'; hive> select * from foo where day >= '${hiveconf:CURRENT_DATE}' similarly, you could pass on command line: % hive -hiveconf CURRENT_DATE=