hiveql | 易学教程

Hive date/timestamp column

阅读更多关于 Hive date/timestamp column

问题 I have some data on HDFS that I am trying to setup to be queried via hive. The data is in the form of comma separated text files. One of the columns in the file is the date/time column as follows: Wed Aug 29 16:16:58 CDT 2018 When I try to read the Hive table created using the following script, I get NULL as the value being read for this column.. use test_db; drop table ORDERS; create external table ORDERS( SAMPLE_DT_TM TIMESTAMP ... ) row format delimited fields terminated by ',' stored as

Hive count tuple?

阅读更多关于 Hive count tuple?

I am pretty new with HiveQL and I am kinda stuck :S I have a table of the following schema. One column named res and three partitioned under partion_column named filed. create table results( res string) PARTITIONED BY (field STRING); I then imported data in this table insert overwrite table results PARTITION (field= 'title') SELECT explode(line) AS myNewCol FROM titles ; insert overwrite table results PARTITION (field= 'artist') SELECT explode(line) AS myNewCol FROM artist; insert overwrite table results PARTITION (field= 'albums') SELECT explode(line) AS myNewCol FROM albums; I am trying to

Hive Query: Matching column Values from Array of string to make Flags

阅读更多关于 Hive Query: Matching column Values from Array of string to make Flags

问题 I have some records where every row belongs to some categories (data type - array of string) and a separate list of unique category(data type - string). I need to match every row with unique list and create flags for it. Input: ------ ID Category 1 ["Physics","Math"] 2 ["Math"] 3 ["Math,"Chemistry"] 4 ["Physics","Computer"] Now I have separate list of unique list of category in excel in local like below: Unique Category ["Physics"] ["Math"] ["Chemistry"] ["Computer"] Final Output should look

How to get the Date of the first day of a week given a time stamp in Hadoop Hive?

阅读更多关于 How to get the Date of the first day of a week given a time stamp in Hadoop Hive?

Besides writing a custom UDF to support this issue, is there any known methods of achieving this? I'm currently using Hive 0.13. Boris Ka date_sub(m.invitationdate,pmod(datediff(m.invitationdate,'1900-01-07'),7)) This expression gives the exact solution to my question. Regards, Boris This is the easiest and the best solution for fetching 1st day of the week's date: For Current timstamp: select date_sub(from_unixtime(unix_timestamp()), cast(from_unixtime(unix_timestamp(), 'u') AS int)) ; For any given date or column: select date_sub(from_unixtime(unix_timestamp('2017-05-15','yyyy-MM-dd')), cast

MismatchedTokenException on hive create table query

阅读更多关于 MismatchedTokenException on hive create table query

问题 I'm trying to create a Hive table with the following query: CREATE TABLE IF NOT EXISTS BXDataSet (ISBN STRING, BookTitle STRING, BookAuthor STRING, YearOfPublication STRING, Publisher STRING, ImageURLS STRING, ImageURLM STRING, ImageURLL STRING) COMMENT 'BX-Books Table' ROW FORMAT DELIMITED FIELDS TERMINATED BY ';' LINE TERMINATED BY '\n' STORED AS TEXTFILE; How when I submitted to Hive I got the following exception: MismatchedTokenException(-1!=301) at org.antlr.runtime.BaseRecognizer

Hive - Split delimited columns over multiple rows, select based on position

阅读更多关于 Hive - Split delimited columns over multiple rows, select based on position

问题 I'm Looking for a way to split the column based on comma delimited data. Below is my dataset id col1 col2 1 5,6 7,8 I want to get the result id col1 col2 1 5 7 1 6 8 The position of the index should match because I need to fetch results accordingly. I tried the below query but it returns the cartesian product. Query : SELECT col3, col4 FROM test ext lateral VIEW explode(split(col1,'\002')) col1 AS col3 lateral VIEW explode(split(col2,'\002')) col2 AS col4 Result : id col1 col2 1 5 7 1 5 8 1 6

Creating a rank that resets on a specific value of a column

阅读更多关于 Creating a rank that resets on a specific value of a column

问题 My current data looks like this (note that it is sorted on datetime): +----------------+---------------------+---------+ | CustomerNumber | Date | Channel | +----------------+---------------------+---------+ | 120584446 | 2015-05-22 21:16:05 | A | | 120584446 | 2015-05-25 18:04:16 | A | | 120584446 | 2015-05-25 18:05:25 | B | | 120584446 | 2015-05-28 20:35:09 | A | | 120584446 | 2015-05-28 20:36:01 | A | | 120584446 | 2015-05-28 20:37:02 | B | | 120584446 | 2015-05-29 13:39:00 | B | +--------

Write a nested select statement with a where clause in Hive

阅读更多关于 Write a nested select statement with a where clause in Hive

I have a requirement to do a nested select within a where clause in a Hive query. A sample code snippet would be as follows; select * from TableA where TA_timestamp > (select timestmp from TableB where id="hourDim") Is this possible or am I doing something wrong here, because I am getting an error while running the above script ?! To further elaborate on what I am trying to do, there is a cassandra keyspace that I publish statistics with a timestamp. Periodically (hourly for example) this stats will be summarized using hive, once summarized that data will be stored separately with the

Get the sysdate -1 in Hive

阅读更多关于 Get the sysdate -1 in Hive

问题 Is there any way to get the current date -1 in Hive means yesterdays date always? And in this format- 20120805 ? I can run my query like this to get the data for yesterday's date as today is Aug 6th - select * from table1 where dt = '20120805'; But when I tried doing this way with date_sub function to get the yesterday's date as the below table is partitioned on date(dt) column. select * from table1 where dt = date_sub(TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP(), 'yyyyMMdd')) , 1) limit 10; It is

unable to create hive table with primary key

阅读更多关于 unable to create hive table with primary key

问题 I am unable to create an external table in hive with primary key. Following is the example code: hive> create table exmp((name string),primary key(name)); This returns me the following error message: NoViableAltException(278@[]) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11216) at org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:35977) at org.apache.hadoop.hive.ql.parse.HiveParser.columnNameType(HiveParser