hiveql

metastore_db created wherever I run Hive

大兔子大兔子 提交于 2019-12-17 18:37:25
问题 Folder metastore_db is created in any directory where I run Hive query. Is there any way to have only one metastore_db in a defined location and stop it from being created all over the places? Does it have anything to do with hive.metastore.local ? 回答1: The property of interest here is javax.jdo.option.ConnectionURL . The default value of this property is jdbc:derby:;databaseName=metastore_db;create=true . This value specifies that you will be using embedded derby as your Hive metastore and

Hive insert query like SQL

烂漫一生 提交于 2019-12-17 17:32:04
问题 I am new to hive, and want to know if there is anyway to insert data into hive table like we do in SQL. I want to insert my data into hive like INSERT INTO tablename VALUES (value1,value2..) I have read that you can load the data from a file to hive table or you can import data from one table to hive table but is there any way to append the data as in SQL? 回答1: Some of the answers here are out of date as of Hive 0.14 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML

Audit hive table

耗尽温柔 提交于 2019-12-14 02:20:34
问题 I have a hive table lets say it as table A. My requirement is to capture all the DML and DDL operations on table A in table B. Is there any way to capture the same? Thanks in advance.. 回答1: I have not come across any such tool however Cloudera Navigator helps to manage it. Refer the detailed documentation. Cloudera Navigator Cloudera Navigator auditing supports tracking access to: HDFS entities accessed by HDFS, Hive, HBase, Impala, and Solr services HBase and Impala Hive metadata Sentry Solr

Updating unique id column for newly added records in table in hive

荒凉一梦 提交于 2019-12-13 20:14:55
问题 I have a table in which I want unique identifier to be added automatically as a new record is inserted into it. Considering I have column for unique identifier already created. 回答1: hive can't update the table but you can create a temporary table or overwrite your first table. you can also use concat function to join the two diferent column or string. here is the examples function :concat(string A, string B…) return: string hive> select concat(‘abc’,'def’,'gh’) from dual; abcdefgh HQL &result

Can a hive script be run from another hive script?

别说谁变了你拦得住时间么 提交于 2019-12-13 16:21:17
问题 I have created two hive scripts script1.hql and script2.hql. Is it possible to run the script script2.hql from script1.hql? I read about using the source command, but could not get around about its use. Any pointers/ref docs will be appreciated.. 回答1: Use source <filepath> command: source /tmp/script2.hql; --inside script1 The docs are here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli Hive will include text of /tmp/script2.hql and execute it in the same context, so all

How insert overwrite table in hive with diffrent where clauses?

半腔热情 提交于 2019-12-13 06:36:50
问题 I want to read a .tsv file from Hbase into hive. The file has a columnfamily, which has 3 columns inside: news, social and all. The aim is to store these columns in an table in hbase which has the columns news, social and all. CREATE EXTERNAL TABLE IF NOT EXISTS topwords_logs (key String, columnfamily String , wort String , col String, occurance int)ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'STORED AS TEXTFILE LOCATION '/home /hfu/Testdaten'; load data local inpath '/home/hfu/Testdaten

Hive Runtime Error: Unable to deserialize reduce input key

五迷三道 提交于 2019-12-13 05:55:29
问题 I am trying to run a Insert in to partition table with group by involved query 'set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; set hive.execution.engine=tez; INSERT OVERWRITE TABLE table1 PARTITION (date) select col1,CONCAT(COALESCE(substr(Cdate,1,4),'-'),'',COALESCE(substr(Cdate,6,2),'-'),'',COALESCE(substr(Cdate,9,2),'-')),col3,col4,'mobile-data',data,date from (select col1,substr(CDate,1,10) as Cdate,u.col3 as col3,u.col4 as col4,date,sum(u.col5+u

Count the number of sessions if the beginning and end of each session is known [duplicate]

守給你的承諾、 提交于 2019-12-13 04:22:13
问题 This question already has answers here : How to group by time interval in Spark SQL (2 answers) Closed 11 months ago . I have a hive table with two columns with date-time values: start and finish of "session". The following is the sample of such a table: +----------------------+----------------------+--+ | start_time | end_time | +----------------------+----------------------+--+ | 2017-01-01 00:24:52 | 2017-01-01 00:25:20 | | 2017-01-01 00:31:11 | 2017-01-01 10:31:15 | | 2017-01-01 10:31:15

Why does 'get_json_object' return different results when run in spark and sql tool

夙愿已清 提交于 2019-12-13 04:04:03
问题 I have developed a hive query that uses lateral views and get_json_object to unpack some json. The query works well enough using a jdbc client (dbvisualizer) against a hive database but when run as spark sql from a java application, on the same data, it returns nothing. I have tracked down the problem to differences in what the function 'get_json_object' returns. The issue can be illustrated by this type of query select concat_ws( "|", get_json_object('{"product_offer":[ {"productName":"Plan

Getting Error in Hive while running a script

耗尽温柔 提交于 2019-12-13 03:57:07
问题 I am creating a temp table from another table using AS clause where i am including the partition column of another table also be part of temp table and then i am getting the below error. Below is the table create statement where col4 is the partition column of table xyz. And while running the create statement i am getting the below error. And when i am removing the col4 from the create statement its running fine Error: Error while compiling statement: FAILED: NumberFormatException For input