hiveql

Implementing Limit query in Hive

不想你离开。 提交于 2019-12-13 02:27:30
问题 For my requirement i have to implement upper and lower limit in hive. For that i am trying to write query something like this SELECT * FROM `your_table` LIMIT 0, 5 SELECT * FROM `your_table` LIMIT 5, 5 But hive supports only 1 limit, it's not supporting upper and lower limit. I tried with with other alternatives to achieve this by using RANK(), ROWNUM() but didn't succeeded. Can anyone please help me to solve this. Thanks in advance. 回答1: Hi you can use the Facebook UDF and rownum

Join 2 tables in Hive using a phone number and a prefix (variable length)

♀尐吖头ヾ 提交于 2019-12-13 01:34:36
问题 I'm trying to match phone numbers to an area using Hive. I've got a table (prefmap) that maps a number prefix (prefix) to an area (area) and another table (users) with a list of phone numbers (nb). There is only 1 match per phone number (no sub-area) The problem is that the length of the prefixes is not fixed so I cannot use the UDF function substr(nb,"prefix's length") in the JOIN's ON() condition to match the substring of a number to a prefix. And when I try to use instr() to find if a

string to date - hive SQL

◇◆丶佛笑我妖孽 提交于 2019-12-13 01:19:20
问题 I am running queries in a hive environment. I have a column which has a timestamp but is set up a string in the tables. I tried the following : all of them return Null SELECT ,To_date(activitydate) Cast: ,cast(activitydate as timestamp) This is the how the data is set up in the table: Appreciate any inputs on how I can convert this : 05/12/2017 00:00:00 SELECT cust_id ,to_date(activitydate) activity_date ,type type_of_contact FROM repl_task WHERE to_date(activitydate) BETWEEN '2014-01-01' AND

get the value from subquery in hive

北城余情 提交于 2019-12-12 22:08:52
问题 I was trying to parameterise the value in hive rather than hard coding it in query. Below is the query. select * from employee where sal >30000 But rather than using 30000 value as hard coded I need that to come from the same query like below. But I am running into issues : select * from employee where sal > (select max(sal) from employee) Any help is appreciated. Thanks 回答1: You can try using this form of Hive query. This will get the employees having salary equal to the highest salary.

How to load data to same Hive table if file has different number of columns

淺唱寂寞╮ 提交于 2019-12-12 20:28:28
问题 I have a main table (Employee) which is having 10 columns and I can load data into it using load data inpath /file1.txt into table Employee My question is how to handle the same table (Employee) if my file file2.txt has same columns but column 3 and columns 5 are missing. if I directly load data last columns will be NULL NULL . but instead it should load 3rd as NULL and 5th column as NULL. Suppose I have a table Employee and I want to load the file1.txt and file2.txt to table. file1.txt =====

How to get the latest records in hive using rank function

∥☆過路亽.° 提交于 2019-12-12 18:31:04
问题 I have below table in hive with column id, name and time stamp: On the basis of time stamp below should be the output as latest record: 回答1: You don't need rank for this. Your output is described by: select t.* from t order by t.transaction_time desc limit 3; EDIT: Oh, you want rank() or dense_rank() : select t.* from (select t.*, dense_rank() over (order by t.transaction_time desc) as seqnum from t ) t where seqnum = 1; 回答2: You can use either rank or row_number for this: select * from (

Hive convert a string to an array of characters

混江龙づ霸主 提交于 2019-12-12 18:22:28
问题 How can I convert a string to an array of characters, for example "abcd" -> ["a","b","c","d"] I know the split methd: SELECT split("abcd",""); #["a","b","c","d",""] is a bug for the last whitespace? or any other ideas? 回答1: This is not actually a bug. Hive split function simply calls the underlying Java String#split(String regexp, int limit) method with limit parameter set to -1 , which causes trailing whitespace(s) to be returned. I'm not going to dig into implementation details on why it's

Hive Case Statement for Insert Overwrite Directory

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-12 18:17:35
问题 When attempting to run an HQL script with the following logic, I receive the error: ParseException line 4:0 cannot recognize input near 'CASE' 'WHEN' 'mytable' in serde properties specification Script Logic INSERT OVERWRITE DIRECTORY '/example/path' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' CASE WHEN ${hiveconf:tbl_name}='mytable' THEN SELECT * FROM ${hiveconf:tbl_name} LEFT OUTER JOIN ...; WHEN ${hiveconf:tbl_name}='mytable2' THEN SELECT * FROM ${hiveconf:tbl_name} LEFT OUTER JOIN ...;

Are timestamps stored with a timezone in Apache Hive?

时光总嘲笑我的痴心妄想 提交于 2019-12-12 18:15:53
问题 The following discussion seems to indicate that Hive timestamps have a timezone: https://community.hortonworks.com/questions/83523/timestamp-in-hive-without-timezone.html The apache wiki says "Timestamps are interpreted to be timezoneless and stored as an offset from the UNIX epoch." I am referring to: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-TimestampstimestampTimestamps If I use code like the following: from_unixtime(unix_timestamp(ts_field,

compare data between two tables with same structure in hive

丶灬走出姿态 提交于 2019-12-12 16:42:45
问题 How to compare two tables with same structures in hive. I believe minus will not work in hive. SRC table: id name 1 A 2 B 3 C TGT table: id name 1 A 2 C 3 C can anyone help me with a query. 回答1: If you are looking for equality between two tables and for differences if any, you can do like following SELECT MIN(TableName) as TableName, ID, NAME FROM ( SELECT 'SRC_TABLE' as TableName, A.ID, A.NAME FROM A UNION ALL SELECT 'TGT_TABLE' as TableName, B.ID, B.NAME FROM B ) tmp GROUP BY ID, NAME