hiveql

java.lang.RuntimeException:Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

主宰稳场 提交于 2019-11-26 09:36:49
问题 I have configured my Hive as given on link: http://www.youtube.com/watch?v=Dqo1ahdBK_A, but I am getting the following error while creating a table in Hive. I am using hadoop-1.2.1 and hive-0.12.0. hive> create table employee(emp_id int,name string,salary double); FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient 回答1: Looks like problem with your metastore.

How do I output the results of a HiveQL query to CSV?

纵然是瞬间 提交于 2019-11-26 07:55:59
问题 we would like to put the results of a Hive query to a CSV file. I thought the command should look like this: insert overwrite directory \'/home/output.csv\' select books from table; When I run it, it says it completeld successfully but I can never find the file. How do I find this file or should I be extracting the data in a different way? Thanks! 回答1: Although it is possible to use INSERT OVERWRITE to get data out of Hive, it might not be the best method for your particular case. First let

Execute Hive Query with IN clause parameters in parallel

落花浮王杯 提交于 2019-11-26 07:47:37
问题 I am having a Hive query like the one below: select a.x as column from table1 a where a.y in (<long comma-separated list of parameters>) union all select b.x as column from table2 b where b.y in (<long comma-separated list of parameters>) I have set hive.exec.parallel as true which is helping me achieve parallelism between the two queries between union all. But, my IN clause has many comma separated values and each value is taken once in 1 job and then the next value. This is actually getting

HIVE select count(*) non null returns higher value than select count(*)

女生的网名这么多〃 提交于 2019-11-26 05:56:44
问题 I am currently doing some data exploration with Hive and cannot explain the following behavior. Say I have a table (named mytable) with a field master_id. When I count the number of row I get select count(*) as c from mytable c 1129563 If I want to count the number of row with a non null master_id, I get a higher number select count(*) as c from mytable where master_id is not null c 1134041 Additionally, the master_id seems to be never null. select count(*) as c from mytable where master_id

Difference between Hive internal tables and external tables?

眉间皱痕 提交于 2019-11-26 04:06:09
问题 Can anyone tell me the difference between Hive\'s external table and internal tables. I know the difference comes when dropping the table. I don\'t understand what you mean by the data and metadata is deleted in internal and only metadata is deleted in external tables. Can anyone explain me in terms of nodes please. 回答1: Hive has a relational database on the master node it uses to keep track of state. For instance, when you CREATE TABLE FOO(foo string) LOCATION 'hdfs://tmp/'; , this table

How to set variables in HIVE scripts

ε祈祈猫儿з 提交于 2019-11-26 01:56:58
问题 I\'m looking for the SQL equivalent of SET varname = value in Hive QL I know I can do something like this: SET CURRENT_DATE = \'2012-09-16\'; SELECT * FROM foo WHERE day >= @CURRENT_DATE But then I get this error: character \'@\' not supported here 回答1: You need to use the special hiveconf for variable substitution. e.g. hive> set CURRENT_DATE='2012-09-16'; hive> select * from foo where day >= '${hiveconf:CURRENT_DATE}' similarly, you could pass on command line: % hive -hiveconf CURRENT_DATE=