hiveql

Hive date/timestamp column

元气小坏坏 提交于 2019-12-07 22:30:39
问题 I have some data on HDFS that I am trying to setup to be queried via hive. The data is in the form of comma separated text files. One of the columns in the file is the date/time column as follows: Wed Aug 29 16:16:58 CDT 2018 When I try to read the Hive table created using the following script, I get NULL as the value being read for this column.. use test_db; drop table ORDERS; create external table ORDERS( SAMPLE_DT_TM TIMESTAMP ... ) row format delimited fields terminated by ',' stored as

Hive count tuple?

我的未来我决定 提交于 2019-12-07 16:56:25
I am pretty new with HiveQL and I am kinda stuck :S I have a table of the following schema. One column named res and three partitioned under partion_column named filed. create table results( res string) PARTITIONED BY (field STRING); I then imported data in this table insert overwrite table results PARTITION (field= 'title') SELECT explode(line) AS myNewCol FROM titles ; insert overwrite table results PARTITION (field= 'artist') SELECT explode(line) AS myNewCol FROM artist; insert overwrite table results PARTITION (field= 'albums') SELECT explode(line) AS myNewCol FROM albums; I am trying to

Hive Query: Matching column Values from Array of string to make Flags

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-07 14:39:13
问题 I have some records where every row belongs to some categories (data type - array of string) and a separate list of unique category(data type - string). I need to match every row with unique list and create flags for it. Input: ------ ID Category 1 ["Physics","Math"] 2 ["Math"] 3 ["Math,"Chemistry"] 4 ["Physics","Computer"] Now I have separate list of unique list of category in excel in local like below: Unique Category ["Physics"] ["Math"] ["Chemistry"] ["Computer"] Final Output should look

How to get the Date of the first day of a week given a time stamp in Hadoop Hive?

久未见 提交于 2019-12-07 12:44:32
Besides writing a custom UDF to support this issue, is there any known methods of achieving this? I'm currently using Hive 0.13. Boris Ka date_sub(m.invitationdate,pmod(datediff(m.invitationdate,'1900-01-07'),7)) This expression gives the exact solution to my question. Regards, Boris This is the easiest and the best solution for fetching 1st day of the week's date: For Current timstamp: select date_sub(from_unixtime(unix_timestamp()), cast(from_unixtime(unix_timestamp(), 'u') AS int)) ; For any given date or column: select date_sub(from_unixtime(unix_timestamp('2017-05-15','yyyy-MM-dd')), cast

MismatchedTokenException on hive create table query

[亡魂溺海] 提交于 2019-12-07 11:45:25
问题 I'm trying to create a Hive table with the following query: CREATE TABLE IF NOT EXISTS BXDataSet (ISBN STRING, BookTitle STRING, BookAuthor STRING, YearOfPublication STRING, Publisher STRING, ImageURLS STRING, ImageURLM STRING, ImageURLL STRING) COMMENT 'BX-Books Table' ROW FORMAT DELIMITED FIELDS TERMINATED BY ';' LINE TERMINATED BY '\n' STORED AS TEXTFILE; How when I submitted to Hive I got the following exception: MismatchedTokenException(-1!=301) at org.antlr.runtime.BaseRecognizer

Hive - Split delimited columns over multiple rows, select based on position

醉酒当歌 提交于 2019-12-07 10:13:50
问题 I'm Looking for a way to split the column based on comma delimited data. Below is my dataset id col1 col2 1 5,6 7,8 I want to get the result id col1 col2 1 5 7 1 6 8 The position of the index should match because I need to fetch results accordingly. I tried the below query but it returns the cartesian product. Query : SELECT col3, col4 FROM test ext lateral VIEW explode(split(col1,'\002')) col1 AS col3 lateral VIEW explode(split(col2,'\002')) col2 AS col4 Result : id col1 col2 1 5 7 1 5 8 1 6

Creating a rank that resets on a specific value of a column

白昼怎懂夜的黑 提交于 2019-12-07 08:27:23
问题 My current data looks like this (note that it is sorted on datetime): +----------------+---------------------+---------+ | CustomerNumber | Date | Channel | +----------------+---------------------+---------+ | 120584446 | 2015-05-22 21:16:05 | A | | 120584446 | 2015-05-25 18:04:16 | A | | 120584446 | 2015-05-25 18:05:25 | B | | 120584446 | 2015-05-28 20:35:09 | A | | 120584446 | 2015-05-28 20:36:01 | A | | 120584446 | 2015-05-28 20:37:02 | B | | 120584446 | 2015-05-29 13:39:00 | B | +--------

Write a nested select statement with a where clause in Hive

若如初见. 提交于 2019-12-07 08:15:24
I have a requirement to do a nested select within a where clause in a Hive query. A sample code snippet would be as follows; select * from TableA where TA_timestamp > (select timestmp from TableB where id="hourDim") Is this possible or am I doing something wrong here, because I am getting an error while running the above script ?! To further elaborate on what I am trying to do, there is a cassandra keyspace that I publish statistics with a timestamp. Periodically (hourly for example) this stats will be summarized using hive, once summarized that data will be stored separately with the

Get the sysdate -1 in Hive

妖精的绣舞 提交于 2019-12-07 05:06:02
问题 Is there any way to get the current date -1 in Hive means yesterdays date always? And in this format- 20120805 ? I can run my query like this to get the data for yesterday's date as today is Aug 6th - select * from table1 where dt = '20120805'; But when I tried doing this way with date_sub function to get the yesterday's date as the below table is partitioned on date(dt) column. select * from table1 where dt = date_sub(TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP(), 'yyyyMMdd')) , 1) limit 10; It is

unable to create hive table with primary key

爷,独闯天下 提交于 2019-12-07 02:15:18
问题 I am unable to create an external table in hive with primary key. Following is the example code: hive> create table exmp((name string),primary key(name)); This returns me the following error message: NoViableAltException(278@[]) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11216) at org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:35977) at org.apache.hadoop.hive.ql.parse.HiveParser.columnNameType(HiveParser