hiveql

I have created a table in hive, I would like to know which directory my table is created in?

只愿长相守 提交于 2019-11-27 11:26:45
问题 I have created a table in hive, I would like to know which directory my table is created in? I would like to know the path... 回答1: DESCRIBE FORMATTED my_table; or DESCRIBE FORMATTED my_table PARTITION (my_column='my_value'); 回答2: There are three ways to describe a table in Hive. 1) To see table primary info of Hive table, use describe table_name; command 2) To see more detailed information about the table, use describe extended table_name; command 3) To see code in a clean manner use describe

What is the difference between -hivevar and -hiveconf?

依然范特西╮ 提交于 2019-11-27 11:14:37
问题 From hive -h : --hiveconf <property=value> Use value for given property --hivevar <key=value> Variable subsitution to apply to hive commands. e.g. --hivevar A=B 回答1: I didn't quite feel like the examples from the documentation were adequate, so here's my attempt at an answer. In the beginning there was only --hiveconf and variable substitution didn't exist. The --hiveconf option allowed users to set Hive configuration values from the command line and that was it. All Hive configuration values

Explode the Array of Struct in Hive

痴心易碎 提交于 2019-11-27 11:07:07
This is the below Hive Table CREATE EXTERNAL TABLE IF NOT EXISTS SampleTable ( USER_ID BIGINT, NEW_ITEM ARRAY<STRUCT<PRODUCT_ID: BIGINT,TIMESTAMPS:STRING>> ) And this is the data in the above table- 1015826235 [{"product_id":220003038067,"timestamps":"1340321132000"},{"product_id":300003861266,"timestamps":"1340271857000"}] Is there any way I can get the below output from the HiveQL after exploding the array? **USER_ID** | **PRODUCT_ID** | **TIMESTAMPS** ------------+------------------+---------------- 1015826235 220003038067 1340321132000 1015826235 300003861266 1340271857000 Updated I wrote

Why partitions elimination does not happen for this query?

两盒软妹~` 提交于 2019-11-27 08:53:21
问题 I have a hive table which is partitioned by year, month, day and hour. I need to run a query against it to fetch the last 7 days data. This is in Hive 0.14.0.2.2.4.2-2 . My query currently looks like this : SELECT COUNT(column_name) from table_name where year >= year(date_sub(from_unixtime(unix_timestamp()), 7)) AND month >= month(date_sub(from_unixtime(unix_timestamp()), 7)) AND day >= day(date_sub(from_unixtime(unix_timestamp()), 7)); This takes a very long time. When I substitute the

Hive Does the order of the data record matters for joining tables

微笑、不失礼 提交于 2019-11-27 07:24:12
问题 I would like to know if the order of the data records matter (performance wise) when joining two tables? P.S. I am not using any map-side join or bucket join. Thank you! 回答1: On the one hand order should not matter because during shuffle join files are being read by mappers in parallel, also files may be splitted between few mappers or vice-versa, one mapper can read few files, then mappers output passed to each reducer. And even if data was ordered it is being read and distributed not in it

Hive Explode / Lateral View multiple arrays

本小妞迷上赌 提交于 2019-11-27 03:51:54
I have a hive table with the following schema: COOKIE | PRODUCT_ID | CAT_ID | QTY 1234123 [1,2,3] [r,t,null] [2,1,null] How can I normalize the arrays so I get the following result COOKIE | PRODUCT_ID | CAT_ID | QTY 1234123 [1] [r] [2] 1234123 [2] [t] [1] 1234123 [3] null null I have tried the following: select concat_ws('|',visid_high,visid_low) as cookie ,pid ,catid ,qty from table lateral view explode(productid) ptable as pid lateral view explode(catalogId) ptable2 as catid lateral view explode(qty) ptable3 as qty however the result comes out as a Cartesian product. You can use the numeric

Error in Hive Query while joining tables

不羁岁月 提交于 2019-11-27 01:40:54
问题 I am unable to pass the equality check using the below HIVE query. I have 3 table and i want to join these table. I trying as below, but get error : FAILED: Error in semantic analysis: Line 3:40 Both left and right aliases encountered in JOIN 'visit_date' select t1.*, t99.* from table1 t1 JOIN (select v3.*, t3.* from table2 v3 JOIN table3 t3 ON ( v3.AS_upc= t3.upc_no AND v3.start_dt <= t3.visit_date AND v3.end_dt >= t3.visit_date AND v3.adv_price <= t3.comp_price ) ) t99 ON (t1.comp_store_id

java.lang.RuntimeException:Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

生来就可爱ヽ(ⅴ<●) 提交于 2019-11-27 01:17:35
I have configured my Hive as given on link: http://www.youtube.com/watch?v=Dqo1ahdBK_A , but I am getting the following error while creating a table in Hive. I am using hadoop-1.2.1 and hive-0.12.0. hive> create table employee(emp_id int,name string,salary double); FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient Looks like problem with your metastore. If you are using the default hive metastore embedded derby. Lock file would be there in case of abnormal

Add minutes to datetime in Hive

狂风中的少年 提交于 2019-11-26 23:38:13
问题 Is there a function in Hive one could use to add minutes(in int) to a datetime similar to DATEADD (datepart,number,date) in sql server where datepart can be minutes : DATEADD(minute,2,'2014-07-06 01:28:02') returns 2014-07-06 01:28:02 On the other hand, Hive's date_add(string startdate, int days) is in days . Any of such for hours ? 回答1: your problem can easily solve by HiveUdf. package HiveUDF; import java.text.ParseException; import java.text.SimpleDateFormat; import java.util.Date; import

How to record created_at and updated_at timestamps in Hive?

不问归期 提交于 2019-11-26 23:32:15
问题 MySQL can automatically record created_at and updated_at timestamps. Does Hive provide similar mechanisms? If not, what would be the best way to achieve this functionality? 回答1: Hive does not provide such mechanism. You can achieve this by using UDF in your select: from_unixtime(unix_timestamp()) as created_at . Note this will be executed in each mapper or reducer and may return different values. If you need the same value for all the dataset (for Hive version before 1.2.0), pass the variable