hiveql

HiveQL: Using query results as variables

假装没事ソ 提交于 2019-11-26 22:56:45
in Hive I'd like to dynamically extract information from a table, save it in a variable and further use it. Consider the following example, where I retrieve the maximum of column var and want to use it as a condition in the subsequent query. set maximo=select max(var) from table; select * from table where var=${hiveconf:maximo} It does not work, although set maximo=select max(var) from table; ${hiveconf:maximo} shows me the intended result. Doing: select '${hiveconf:maximo}' gives "select max(var) from table" though. Best Hive substitutes variables as is and does not execute them. Use shell

Are Hive's implicit joins always inner joins?

安稳与你 提交于 2019-11-26 21:57:19
问题 The join documentation for Hive encourages the use of implicit joins, i.e. SELECT * FROM table1 t1, table2 t2, table3 t3 WHERE t1.id = t2.id AND t2.id = t3.id AND t1.zipcode = '02535'; Is this equivalent to SELECT t1.*, t2.*, t3.* FROM table1 t1 INNER JOIN table2 t2 ON t1.id = t2.id INNER JOIN table3 t3 ON t2.id = t3.id WHERE t1.zipcode = '02535' , or will the above return additional records? 回答1: Not always. Your queries are equivalent. But without WHERE t1.id = t2.id AND t2.id = t3.id it

How do I output the results of a HiveQL query to CSV?

好久不见. 提交于 2019-11-26 21:25:21
we would like to put the results of a Hive query to a CSV file. I thought the command should look like this: insert overwrite directory '/home/output.csv' select books from table; When I run it, it says it completeld successfully but I can never find the file. How do I find this file or should I be extracting the data in a different way? Thanks! Although it is possible to use INSERT OVERWRITE to get data out of Hive, it might not be the best method for your particular case. First let me explain what INSERT OVERWRITE does, then I'll describe the method I use to get tsv files from Hive tables.

Joining two Tables in Hive using HiveQL(Hadoop) [duplicate]

你离开我真会死。 提交于 2019-11-26 21:02:49
问题 Possible Duplicate: SQL Query JOIN with Table CREATE EXTERNAL TABLE IF NOT EXISTS TestingTable1 (This is the MAIN table through which comparisons need to be made) ( BUYER_ID BIGINT, ITEM_ID BIGINT, CREATED_TIME STRING ) And this is the data in the above first table **BUYER_ID** | **ITEM_ID** | **CREATED_TIME** --------------+------------------+------------------------- 1015826235 220003038067 *2001-11-03 19:40:21* 1015826235 300003861266 2001-11-08 18:19:59 1015826235 140002997245 2003-08-22

Execute Hive Query with IN clause parameters in parallel

烈酒焚心 提交于 2019-11-26 20:57:18
I am having a Hive query like the one below: select a.x as column from table1 a where a.y in (<long comma-separated list of parameters>) union all select b.x as column from table2 b where b.y in (<long comma-separated list of parameters>) I have set hive.exec.parallel as true which is helping me achieve parallelism between the two queries between union all. But, my IN clause has many comma separated values and each value is taken once in 1 job and then the next value. This is actually getting executed sequentially. Is there any hive parameter which if enabled can help me fetch data parallelly

Explode (transpose?) multiple columns in Spark SQL table

限于喜欢 提交于 2019-11-26 20:36:25
I am using Spark SQL (I mention that it is in Spark in case that affects the SQL syntax - I'm not familiar enough to be sure yet) and I have a table that I am trying to re-structure, but I'm getting stuck trying to transpose multiple columns at the same time. Basically I have data that looks like: userId someString varA varB 1 "example1" [0,2,5] [1,2,9] 2 "example2" [1,20,5] [9,null,6] and I'd like to explode both varA and varB simultaneously (the length will always be consistent) - so that the final output looks like this: userId someString varA varB 1 "example1" 0 1 1 "example1" 2 2 1

How to export data from Spark SQL to CSV

江枫思渺然 提交于 2019-11-26 19:18:43
问题 This command works with HiveQL: insert overwrite directory '/data/home.csv' select * from testtable; But with Spark SQL I'm getting an error with an org.apache.spark.sql.hive.HiveQl stack trace: java.lang.RuntimeException: Unsupported language features in query: insert overwrite directory '/data/home.csv' select * from testtable Please guide me to write export to CSV feature in Spark SQL. 回答1: You can use below statement to write the contents of dataframe in CSV format df.write.csv("/data

Difference between Hive internal tables and external tables?

霸气de小男生 提交于 2019-11-26 14:53:09
Can anyone tell me the difference between Hive's external table and internal tables. I know the difference comes when dropping the table. I don't understand what you mean by the data and metadata is deleted in internal and only metadata is deleted in external tables. Can anyone explain me in terms of nodes please. prestomation Hive has a relational database on the master node it uses to keep track of state. For instance, when you CREATE TABLE FOO(foo string) LOCATION 'hdfs://tmp/'; , this table schema is stored in the database. If you have a partitioned table, the partitions are stored in the

Hive Explode / Lateral View multiple arrays

情到浓时终转凉″ 提交于 2019-11-26 10:57:35
问题 I have a hive table with the following schema: COOKIE | PRODUCT_ID | CAT_ID | QTY 1234123 [1,2,3] [r,t,null] [2,1,null] How can I normalize the arrays so I get the following result COOKIE | PRODUCT_ID | CAT_ID | QTY 1234123 [1] [r] [2] 1234123 [2] [t] [1] 1234123 [3] null null I have tried the following: select concat_ws(\'|\',visid_high,visid_low) as cookie ,pid ,catid ,qty from table lateral view explode(productid) ptable as pid lateral view explode(catalogId) ptable2 as catid lateral view

How to set variables in HIVE scripts

自古美人都是妖i 提交于 2019-11-26 10:12:45
I'm looking for the SQL equivalent of SET varname = value in Hive QL I know I can do something like this: SET CURRENT_DATE = '2012-09-16'; SELECT * FROM foo WHERE day >= @CURRENT_DATE But then I get this error: character '@' not supported here libjack You need to use the special hiveconf for variable substitution. e.g. hive> set CURRENT_DATE='2012-09-16'; hive> select * from foo where day >= '${hiveconf:CURRENT_DATE}' similarly, you could pass on command line: % hive -hiveconf CURRENT_DATE='2012-09-16' -f test.hql Note that there are env and system variables as well, so you can reference ${env