hiveql | 易学教程

I have created a table in hive, I would like to know which directory my table is created in?

阅读更多关于 I have created a table in hive, I would like to know which directory my table is created in?

问题 I have created a table in hive, I would like to know which directory my table is created in? I would like to know the path... 回答1: DESCRIBE FORMATTED my_table; or DESCRIBE FORMATTED my_table PARTITION (my_column='my_value'); 回答2: There are three ways to describe a table in Hive. 1) To see table primary info of Hive table, use describe table_name; command 2) To see more detailed information about the table, use describe extended table_name; command 3) To see code in a clean manner use describe

What is the difference between -hivevar and -hiveconf?

阅读更多关于 What is the difference between -hivevar and -hiveconf?

问题 From hive -h : --hiveconf <property=value> Use value for given property --hivevar <key=value> Variable subsitution to apply to hive commands. e.g. --hivevar A=B 回答1: I didn't quite feel like the examples from the documentation were adequate, so here's my attempt at an answer. In the beginning there was only --hiveconf and variable substitution didn't exist. The --hiveconf option allowed users to set Hive configuration values from the command line and that was it. All Hive configuration values

Explode the Array of Struct in Hive

阅读更多关于 Explode the Array of Struct in Hive

This is the below Hive Table CREATE EXTERNAL TABLE IF NOT EXISTS SampleTable ( USER_ID BIGINT, NEW_ITEM ARRAY<STRUCT<PRODUCT_ID: BIGINT,TIMESTAMPS:STRING>> ) And this is the data in the above table- 1015826235 [{"product_id":220003038067,"timestamps":"1340321132000"},{"product_id":300003861266,"timestamps":"1340271857000"}] Is there any way I can get the below output from the HiveQL after exploding the array? **USER_ID** | **PRODUCT_ID** | **TIMESTAMPS** ------------+------------------+---------------- 1015826235 220003038067 1340321132000 1015826235 300003861266 1340271857000 Updated I wrote

Why partitions elimination does not happen for this query?

阅读更多关于 Why partitions elimination does not happen for this query?

问题 I have a hive table which is partitioned by year, month, day and hour. I need to run a query against it to fetch the last 7 days data. This is in Hive 0.14.0.2.2.4.2-2 . My query currently looks like this : SELECT COUNT(column_name) from table_name where year >= year(date_sub(from_unixtime(unix_timestamp()), 7)) AND month >= month(date_sub(from_unixtime(unix_timestamp()), 7)) AND day >= day(date_sub(from_unixtime(unix_timestamp()), 7)); This takes a very long time. When I substitute the

Hive Does the order of the data record matters for joining tables

阅读更多关于 Hive Does the order of the data record matters for joining tables

问题 I would like to know if the order of the data records matter (performance wise) when joining two tables? P.S. I am not using any map-side join or bucket join. Thank you! 回答1: On the one hand order should not matter because during shuffle join files are being read by mappers in parallel, also files may be splitted between few mappers or vice-versa, one mapper can read few files, then mappers output passed to each reducer. And even if data was ordered it is being read and distributed not in it

Hive Explode / Lateral View multiple arrays

阅读更多关于 Hive Explode / Lateral View multiple arrays

I have a hive table with the following schema: COOKIE | PRODUCT_ID | CAT_ID | QTY 1234123 [1,2,3] [r,t,null] [2,1,null] How can I normalize the arrays so I get the following result COOKIE | PRODUCT_ID | CAT_ID | QTY 1234123 [1] [r] [2] 1234123 [2] [t] [1] 1234123 [3] null null I have tried the following: select concat_ws('|',visid_high,visid_low) as cookie ,pid ,catid ,qty from table lateral view explode(productid) ptable as pid lateral view explode(catalogId) ptable2 as catid lateral view explode(qty) ptable3 as qty however the result comes out as a Cartesian product. You can use the numeric

Error in Hive Query while joining tables

阅读更多关于 Error in Hive Query while joining tables

问题 I am unable to pass the equality check using the below HIVE query. I have 3 table and i want to join these table. I trying as below, but get error : FAILED: Error in semantic analysis: Line 3:40 Both left and right aliases encountered in JOIN 'visit_date' select t1.*, t99.* from table1 t1 JOIN (select v3.*, t3.* from table2 v3 JOIN table3 t3 ON ( v3.AS_upc= t3.upc_no AND v3.start_dt <= t3.visit_date AND v3.end_dt >= t3.visit_date AND v3.adv_price <= t3.comp_price ) ) t99 ON (t1.comp_store_id

java.lang.RuntimeException:Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

阅读更多关于 java.lang.RuntimeException:Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

I have configured my Hive as given on link: http://www.youtube.com/watch?v=Dqo1ahdBK_A , but I am getting the following error while creating a table in Hive. I am using hadoop-1.2.1 and hive-0.12.0. hive> create table employee(emp_id int,name string,salary double); FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient Looks like problem with your metastore. If you are using the default hive metastore embedded derby. Lock file would be there in case of abnormal

Add minutes to datetime in Hive

阅读更多关于 Add minutes to datetime in Hive

问题 Is there a function in Hive one could use to add minutes(in int) to a datetime similar to DATEADD (datepart,number,date) in sql server where datepart can be minutes : DATEADD(minute,2,'2014-07-06 01:28:02') returns 2014-07-06 01:28:02 On the other hand, Hive's date_add(string startdate, int days) is in days . Any of such for hours ? 回答1: your problem can easily solve by HiveUdf. package HiveUDF; import java.text.ParseException; import java.text.SimpleDateFormat; import java.util.Date; import

How to record created_at and updated_at timestamps in Hive?

阅读更多关于 How to record created_at and updated_at timestamps in Hive?

问题 MySQL can automatically record created_at and updated_at timestamps. Does Hive provide similar mechanisms? If not, what would be the best way to achieve this functionality? 回答1: Hive does not provide such mechanism. You can achieve this by using UDF in your select: from_unixtime(unix_timestamp()) as created_at . Note this will be executed in each mapper or reducer and may return different values. If you need the same value for all the dataset (for Hive version before 1.2.0), pass the variable