hiveql

Not able to recover partitions through alter table in Hive 1.2

无人久伴 提交于 2019-12-12 03:35:39
问题 I am not able to run ALTER TABLE MY_EXTERNAL_TABLE RECOVER PARTITIONS; on hive 1.2, however when i run the alternative MSCK REPAIR TABLE MY_EXTERNAL_TABLE its just listing the partitions which aren't there in Hive Meta Store and not adding it. Based on the source code from hive-exec am able to see under org/apache/hadoop/hive/ql/parse/HiveParser.g:1001:1 that theres no token matching in the grammer for RECOVER PARTITIONS. Kindly let me know if theres a way to recover all the partitions after

Create table partition in Hive for year,month and day

こ雲淡風輕ζ 提交于 2019-12-12 02:26:22
问题 I have my data folder in the below structure with 2 years data(2015-2017). AppData/ContryName/year/month/Day/app1.json For eg: AppData/India/2016/07/01/geek.json AppData/India/2016/07/02/geek.json AppData/US/2016/07/01/geek.json Now I have created an external table with partition. PARTITIONED BY (Country String, Year String, Month String, day String) After this, I need to add the partition in alter table statement. ALTER TABLE mytable ADD PARTITION (country='India', year='2016',month='01',

Hive - over (partition by …) with a column not in group by

情到浓时终转凉″ 提交于 2019-12-11 19:06:55
问题 Is it possible to do something like: select avg(count(distinct user_id)) over (partition by some_date) as average_users_per_day from user_activity group by user_type (notably, the partition by column, some_date , is not in the group by columns) The idea I'm going for is something like: the average users per day by user type . I know how to do it using subqueries (see below), but I'd like to know if there is a nice way using only over (partition by ...) and group by . Notes: From reading this

How to handle XML file in hive

寵の児 提交于 2019-12-11 18:39:02
问题 How to handle this XML file in hive, I want only USERNAME and PASSWORD in output <?XML version=’1.0′ ?> <DATA> <USER USERNAME="ABC" FIRSTNAME="RAJ" LASTNAME="KUMAR" PASSWORD="123" /> <USER USERNAME="DEF" FIRSTNAME="VENKAT" LASTNAME="BALAJI" PASSWORD="123" /> </DATA> CREATE TABLE user_xml(USERNAME string,PASSWORD string) ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe' WITH SERDEPROPERTIES ( "column.xpath.USERNAME"="/DATA/USER/USERNAME/text()", "column.xpath.PASSWORD"="/DATA/USER

Hive: Cannot insert into table with map column

若如初见. 提交于 2019-12-11 18:34:13
问题 here is my table hive> desc test_tab; OK test_map map<string,string> test_date string # Partition Information # col_name data_type comment test_date string Time taken: 0.087 seconds, Fetched: 7 row(s) and here is my insert statement hive> insert into table test_tab > values ('2018-02-28', map("key","val")); but i get FAILED: ParseException line 2:0 cannot recognize input near 'values' '(' ''2018-02-28'' in select clause I also tried hive> insert into table test_tab partition (test_date =

Divide Ids based on quarter and the count either 1 or 0 by determining the quarter

自古美人都是妖i 提交于 2019-12-11 17:17:22
问题 We have two columns Id and month Id . The output what I'm looking for is to divide year from month Id based on quarter granularity. The activity column should be from quarter. If id is active activity should be 1 else 0 .If id comes in any of the 1st quarter (eg:only 1) the activity is still 1 . Like this: id month_dt ----------------------------------- 1000000000 2012-03-01 00:00:00.0 1000000000 2015-09-01 00:00:00.0 1000000000 2016-10-01 00:00:00.0 1000000000 2015-11-01 00:00:00.0

Find the difference in days from previous timestamp in hive

…衆ロ難τιáo~ 提交于 2019-12-11 16:58:51
问题 I want to find the days difference and populate a new column in my target table. The difference is created by subtracting the previous date from current date. Please find the attached screen shot for reference. Thanks. 回答1: Using LAG function help you to get the previous row and DATEDIFF to get the difference. select id, function_id, key, pre_date, datediff(pre_date, lag(pre_date, 1) over(order by id)) as days_difference from [Your_Table] 来源: https://stackoverflow.com/questions/52831735/find

In Hive, Need to print mismatch value compared from one Master table in Hive with 3 lookup tables:

徘徊边缘 提交于 2019-12-11 15:46:18
问题 In Hive, I need one query to compare One Master table with three different lookup tables. If the record is matched with all 3 lookup tables, record should be updated as "Passed" If any one of the record is failed for any mis-match any of the tables, record should be updated and marked with mismatch value should be displayed Master Table: EMPNO EMPNAME CLASS SCHOOL LOCATION M1 M2 M3 101 SCOTT 4 MVM IDAHO 50 60 80 102 TIGER 7 MIV TEXAS 50 70 80 103 RAYON 3 MOV LONDON 80 75 80 EMPLOYEE: EMPNO

query to divide data

走远了吗. 提交于 2019-12-11 15:46:11
问题 we have two columns id and monthid. The output what I'm looking for is to divide year from month Id based on quarter . The output column should be from quarter. If id is active output should be 1 else 0 .If id comes in any of the 1st quarter (eg:only 1) the output is still 1 . Like this: id month ----------------------------------- 100 2012-03-01 00:00:00.0 100 2015-09-01 00:00:00.0 100 2016-10-01 00:00:00.0 100 2015-11-01 00:00:00.0 100 2014-01-01 00:00:00.0 100 2013-04-01 00:00:00.0 100

TimeStamp issue in hive 1.1

三世轮回 提交于 2019-12-11 14:21:46
问题 I am facing a very weird issue in hive in production environment(cloudera 5.5) which is basically not reproducible in my local server(Don't know why) i.e. for some records I am having wrong timestamp value while inserting from temp table to main table as String "2017-10-21 23" is converted into timestamp "2017-10-21 23:00:00" datatype while insertion. example:: 2017-10-21 23 -> 2017-10-21 22:00:00 2017-10-22 15 -> 2017-10-22 14:00:00 It is happening very very infrequent. Means delta value is