hiveql

Hadoop/Hive Collect_list without repeating items

时间秒杀一切 提交于 2019-12-31 04:29:04
问题 Based on the post, Hive 0.12 - Collect_list, I am trying to locate Java code to implement a UDAF that will accomplish this or similar functionality but without a repeating sequence. For instance, collect_all() returns a sequence A, A, A, B, B, A, C, C I would like to have the sequence A, B, A, C returned. Sequentially repeated items would be removed. Does anyone know of a function in Hive 0.12 that will accomplish or has written their own UDAF? As always, thanks for the help. 回答1: I ran into

How hashing works in bucketing for hive?

眉间皱痕 提交于 2019-12-31 03:52:25
问题 I know the hashing principal for HashMap in Java, so wanted to know that how the hashing works for the Hive while we bucketing the data in various bucket. 回答1: I recently had to dig into some Hive source code to figure this out for myself. Here's what I found: For an integer field, the hash is just the integer value. For a string, it uses a similar version of Java's String hashCode. When hashing multiple values, the hash is a similar version of Java’s List hashCode. 回答2: Bucketing is used

Hive - Can one extract common options for reuse in other scripts?

非 Y 不嫁゛ 提交于 2019-12-30 07:06:11
问题 I have two Hive scripts which look like this: Script A: SET hive.exec.dynamic.partition=true; SET hive.exec.dynamic.partition.mode=non-strict; SET hive.exec.parallel=true; ... do something ... Script B: SET hive.exec.dynamic.partition=true; SET hive.exec.dynamic.partition.mode=non-strict; SET hive.exec.parallel=true; ... do something else ... The options that we set at the beginning of each script are the same. Is it possible somehow to extract them out to a common place (for example, into a

HiveQL - How to find the column value is numeric or not using any UDF?

拥有回忆 提交于 2019-12-30 03:18:08
问题 Basically i would like to return rows based on one column value . If the column contains non numeric values, then return those rows from a hive table. Any UDF is available in Hive ? 回答1: I believe Hive supports rlike (regular expressions). So, you can do: where col rlike '[^0-9]' This looks for any non-digit character. You can expand this, if your numeric values might have decimal points or commas. 回答2: Use cast(expr as <type>) . A null is returned if the conversion does not succeed. case

How can I convert array to string in hive sql?

只愿长相守 提交于 2019-12-30 02:39:07
问题 I want to convert an array to string in hive. I want to collect_set array values to convert to string without [[""]] . select actor, collect_set(date) as grpdate from actor_table group by actor; so that [["2016-07-01", "2016-07-02"]] would become 2016-07-01, 2016-07-02 回答1: Use concat_ws(string delimiter, array<string>) function to concatenate array: select actor, concat_ws(',',collect_set(date)) as grpdate from actor_table group by actor; If the date field is not string, then convert it to

Map type variable in hive

谁都会走 提交于 2019-12-30 01:24:29
问题 I am having trouble trying to define map type in hive. According to Hive Manual there definitely is a map type, unfortunately there aren't any examples on how to use it. :-( Suppose, I have a table (users) with following columns: Name Ph CategoryName This "CategoryName" column has specific set of values. Now I want to create a hashtable that maps CategoryName to CategoryID. I tried doing: set hivevar:nameToID=map('A',1,'B',2); I have 2 questions: When I do set hivevar:${nameToID['A']} I

how to convert date 2017-sep-12 To 2017-09-12 in HIVE

限于喜欢 提交于 2019-12-29 09:11:29
问题 I am facing one issue in converting the date in hive. I need to convert 2017-sep-12 To 2017-09-12 . How can i achieve this in HIVE 回答1: Use unix_timestamp(string date, string pattern) to convert given date format to seconds passed from 1970-01-01. Then use from_unixtime() to convert to given format: hive> select from_unixtime(unix_timestamp('2017-sep-12' ,'yyyy-MMM-dd'), 'dd-MM-yyyy'); OK 12-09-2017 来源: https://stackoverflow.com/questions/47301455/how-to-convert-date-2017-sep-12-to-2017-09-12

Array Intersection in Spark SQL

旧巷老猫 提交于 2019-12-29 08:03:10
问题 I have a table with a array type column named writer which has the values like array[value1, value2] , array[value2, value3] .... etc. I am doing self join to get results which have common values between arrays. I tried: sqlContext.sql("SELECT R2.writer FROM table R1 JOIN table R2 ON R1.id != R2.id WHERE ARRAY_INTERSECTION(R1.writer, R2.writer)[0] is not null ") And sqlContext.sql("SELECT R2.writer FROM table R1 JOIN table R2 ON R1.id != R2.id WHERE ARRAY_INTERSECT(R1.writer, R2.writer)[0] is

How to update table in Hive 0.13?

﹥>﹥吖頭↗ 提交于 2019-12-28 07:09:42
问题 My Hive version is 0.13. I have two tables, table_1 and table_2 table_1 contains: customer_id | items | price | updated_date ------------+-------+-------+------------- 10 | watch | 1000 | 20170626 11 | bat | 400 | 20170625 table_2 contains: customer_id | items | price | updated_date ------------+----------+-------+------------- 10 | computer | 20000 | 20170624 I want to update records of table_2 if customer_id already exists in it, if not, it should append to table_2 . As Hive 0.13 does not

How to update table in Hive 0.13?

走远了吗. 提交于 2019-12-28 07:09:12
问题 My Hive version is 0.13. I have two tables, table_1 and table_2 table_1 contains: customer_id | items | price | updated_date ------------+-------+-------+------------- 10 | watch | 1000 | 20170626 11 | bat | 400 | 20170625 table_2 contains: customer_id | items | price | updated_date ------------+----------+-------+------------- 10 | computer | 20000 | 20170624 I want to update records of table_2 if customer_id already exists in it, if not, it should append to table_2 . As Hive 0.13 does not