hiveql

Hive LEFT SEMI JOIN for 'NOT EXISTS'

白昼怎懂夜的黑 提交于 2019-12-18 20:05:15
问题 I have two tables with a single key column. Keys in table a are subset of all keys in table b. I need to select keys from table b that are NOT in table a. Here is a citation from Hive manual: "LEFT SEMI JOIN implements the uncorrelated IN/EXISTS subquery semantics in an efficient way. As of Hive 0.13 the IN/NOT IN/EXISTS/NOT EXISTS operators are supported using subqueries so most of these JOINs don't have to be performed manually anymore. The restrictions of using LEFT SEMI JOIN is that the

Creating Views in Hive with parameter

左心房为你撑大大i 提交于 2019-12-18 17:37:34
问题 I have a table that contains rows belonging to various dates. I want to CREATE A VIEW which should give me the data based on the date CREATE VIEW newusers AS SELECT DISTINCT T1.uuid FROM user_visit T1 WHERE T1.firstSeen="20140522"; I do not want to fix WHERE T1.firstSeen="20140522"; it can be any date like 20140525 etc. Is there any way that I can create a view with date as parameter? 回答1: Not really sure if creating a view with such variable actually works. With Hive 1.2 an onwards, this is

Creating Views in Hive with parameter

做~自己de王妃 提交于 2019-12-18 17:37:23
问题 I have a table that contains rows belonging to various dates. I want to CREATE A VIEW which should give me the data based on the date CREATE VIEW newusers AS SELECT DISTINCT T1.uuid FROM user_visit T1 WHERE T1.firstSeen="20140522"; I do not want to fix WHERE T1.firstSeen="20140522"; it can be any date like 20140525 etc. Is there any way that I can create a view with date as parameter? 回答1: Not really sure if creating a view with such variable actually works. With Hive 1.2 an onwards, this is

Delta/Incremental Load in Hive

亡梦爱人 提交于 2019-12-18 12:44:47
问题 I have the use case below : My application has a table having multiyear data in RDBMS DB. We have used sqoop to get data into HDFS and have loaded into hive table partitioned by year, month . Now, the application updates, and inserts new records into RDBMS Table table daily as well. These updated records can span across history months. Updated records and new insert records can be determined by updated timestamp field (it will have current day timestamp). Now the problem here is : how to do

Is there a way to transpose data in Hive

六月ゝ 毕业季﹏ 提交于 2019-12-18 11:57:47
问题 This is my table: pid high medium low 1 10 8 6 2 20 16 12 3 10 6 4 I want store this data in another table in Hive with the following format: pid priority value 1 high 10 1 medium 8 1 low 6 2 high 20 2 medium 16 2 low 12 3 high 10 3 medium 6 3 low 4 回答1: Yes there is a way to do this in Hive. You just need to create a map and then explode said map. Query : CREATE TABLE db.new AS SELECT pid, priority, value FROM ( SELECT pid , MAP('high', high, 'medium', medium, 'low', low) AS tmp FROM db.old

Is there a way to transpose data in Hive

点点圈 提交于 2019-12-18 11:57:27
问题 This is my table: pid high medium low 1 10 8 6 2 20 16 12 3 10 6 4 I want store this data in another table in Hive with the following format: pid priority value 1 high 10 1 medium 8 1 low 6 2 high 20 2 medium 16 2 low 12 3 high 10 3 medium 6 3 low 4 回答1: Yes there is a way to do this in Hive. You just need to create a map and then explode said map. Query : CREATE TABLE db.new AS SELECT pid, priority, value FROM ( SELECT pid , MAP('high', high, 'medium', medium, 'low', low) AS tmp FROM db.old

How can I add a timestamp column in hive

依然范特西╮ 提交于 2019-12-18 08:54:31
问题 I have 2 rows like below: 941 78 252 3008 86412 1718502 257796 2223252 292221 45514 114894 980 78 258 3064 88318 1785623 269374 2322408 305467 46305 116970 I want to insert current time stamp while inserting each row. finally in my hive table row should be like below: 941 78 252 3008 86412 1718502 257796 2223252 292221 45514 114894 2014-10-21 980 78 258 3064 88318 1785623 269374 2322408 305467 46305 116970 2014-10-22 Is there any way I can insert timestamp directly into hive without using pig

How to concatenate the elemets of int array to string in Hive

自古美人都是妖i 提交于 2019-12-18 07:14:44
问题 I'm trying to concatenate the element of int array to one string in hive. The function concat_ws works only for string arrays, so I tried cast(my_int_array as string) but it's not working. Any suggestion? 回答1: Try to transform using /bin/cat: from mytable select transform(my_int_array) using '/bin/cat' as (my_int_array); Second option is to alter table and replace delimiters: 1) ALTER TABLE mytable CHANGE COLUMN my_int_array = my_int_array_string string; 2) SELECT REPLACE(my_int_array_string,

Find last day of a month in Hive

十年热恋 提交于 2019-12-17 20:45:49
问题 My question is : Is there a way to do find the last day of a month in Hive, like Oracle SQL function ? : LAST_DAY(D_Dernier_Jour) Thanks. 回答1: You could make use of last_day(dateString) UDF provided by Nexr. It returns the last day of the month based on a date string with yyyy-MM-dd HH:mm:ss pattern. Example: SELECT last_day('2003-03-15 01:22:33') FROM src LIMIT 1; 2003-03-31 00:00:00 You need to pull it from their Github Repository and build. Their wiki page contains all the info on how to

hive Expression Not In Group By Key

橙三吉。 提交于 2019-12-17 18:52:13
问题 I create a table in HIVE. It has the following columns: id bigint, rank bigint, date string I want to get avg(rank) per month. I can use this command. It works. select a.lens_id, avg(a.rank) from tableA a group by a.lens_id, year(a.date_saved), month(a.date_saved); However, I also want to get date information. I use this command: select a.lens_id, avg(a.rank), a.date_saved from lensrank_archive a group by a.lens_id, year(a.date_saved), month(a.date_saved); It complains: Expression Not In