hiveql | 易学教程

How to build a hive table on data which is separated by '^P' delimiter

阅读更多关于 How to build a hive table on data which is separated by '^P' delimiter

问题 My query is: CREATE EXTERNAL TABLE gateway_staging ( poll int, total int, transaction_id int, create_time timestamp, update_time timestamp ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '^P'; (I am not sure whether '^P' can be used as a delimiter but tried it out) The result is showing all fields 'none' when I load the data into hive table. The data looks like: 4307421698^P200^P138193920770^P2017-03-08 02:46:18.021204^P2017-03-08 02:46:18.021204 Please help me out. 回答1: Here are the options: ...

How to get previous day date in Hive

阅读更多关于 How to get previous day date in Hive

问题 I am novice to Hive. Trying to get the previous day date using the below query: SELECT MAX(id) FROM store_rcd_table WHERE recon_dt = unix_timestamp(date_sub(from_unixtime(unix_timestamp(), 'yyyy-MM-dd'),1),'yyyy-MM-dd') but getting the NULL as output. The output should have been date(2017-09-23) and MAX(id). Also tried, Select MAX(id) FROM store_rcd_table WHERE recon_dt ='2017-09-24'; No output for this query also, just OK is coming as an output. Not getting what the issue is? Any suggestion/

How to query struct array with Hive (get_json_object)?

阅读更多关于 How to query struct array with Hive (get_json_object)?

问题 I store the following JSON objects in a Hive table: { "main_id": "qwert", "features": [ { "scope": "scope1", "name": "foo", "value": "ab12345", "age": 50, "somelist": ["abcde","fghij"] }, { "scope": "scope2", "name": "bar", "value": "cd67890" }, { "scope": "scope3", "name": "baz", "value": [ "A", "B", "C" ] } ] } "features" is an array of varying length, i.e. all objects are optional. The objects have arbitrary elements, but all of them contain "scope", "name" and "value". This is the Hive

Conditional aggregate with Group By clause

阅读更多关于 Conditional aggregate with Group By clause

问题 I'm trying to do this with HiveQL but I don't know how to do this in SQL neither. Table structure as follows: id1 id2 category 123 abc 1 123 def 1 123 def 2 456 abc 1 123 abc 1 123 abc 2 ... I'd like to write a query that outputs: key count category1count category2count 123-abc 3 2 1 123-def 2 1 1 456-abc 1 1 0 So far I've got this: SELECT concat( concat(id1,'-'), id2), count(*) , count( SELECT * WHERE buyingcategory = 1 ??? ) , count( SELECT * WHERE buyingcategory = 2 ??? ) FROM table GROUP

SemanticException [Error 10004]: Line 11:18 Invalid table alias or column reference

阅读更多关于 SemanticException [Error 10004]: Line 11:18 Invalid table alias or column reference

问题 This question is related to the answer of my previous question. Please notice that datetime is string. Therefore I convert it to unix timestamp. I execute this query in Hive 1.2.1: select count(*) / count(distinct to_date(datetime)) as trips_per_day from (select radar_id,to_unix_timestamp(datetime),lead(radar_id) over w as next_radar_id,lead(to_unix_timestamp(datetime)) over w as next_datetime from mytable where radar_id in ('A21','B15') window w as (partition by car_id order by to_unix

How to exclude special characters in a string using regular expressions in hive

阅读更多关于 How to exclude special characters in a string using regular expressions in hive

问题 I want to exclude periods( . ) and braces ( ( , ) ). However, decimal numbers should be left intact So basically if the input is Hive supports subqueries only in the FROM clause (through Hive 0.12). The subquery has to be given a name because every table in a FROM clause must have a name. Columns in the subquery select list must have unique names. The output should be Hive supports subqueries only in the FROM clause through Hive 0.12 The subquery has to be given a name because every table in

retrieve udf results in Hive

阅读更多关于 retrieve udf results in Hive

问题 In the following HiveQL code, I want to add partition to an existing table: -- my_table was defined and partitioned by `dt string`, which is date -- now I want to add partition alter table my_table add if not exists partition (dt=current_date()); #FAILED: ParseException line 1:72 extraneous input '(' expecting ) near '<EOF>' alter table my_table add if not exists partition (dt=${current_date()}); # FAILED: ParseException line 1:60 cannot recognize input near '$' '{' 'current_date' in constant

Hive - How to efficiently Create Table As Select?

阅读更多关于 Hive - How to efficiently Create Table As Select?

问题 I have a hive table, htable that's partitioned on foo and bar . I want to create a small subset of this table for experiments, so I would think the thing to do would be create table new_table like htable; insert into new_table partition (foo, bar) select * from htable where rand() < 0.01 and foo in (a,b) This takes forever however and finally fails with a java.lang.OutOfMemoryError: Java heap space . Is there a better way? 回答1: Add distribute by foo, bar : insert into new_table partition (foo

Hive/SQL bundling columns for few columns,rest of the columns are pull based lowest/highest of other columns

阅读更多关于 Hive/SQL bundling columns for few columns,rest of the columns are pull based lowest/highest of other columns

问题 i have a hive table as below with 5 columns name orderno productcategory amount description KJFSFKS 1 1 40 D1 KJFSFKS 2 2 50 D2 KJFSFKS 3 2 67 D3 KJFSFKS 4 2 10 D4 KJFSFKS 5 3 2 D5 KJFSFKS 6 3 5 D6 KJFSFKS 7 3 6 D7 KJFSFKS 8 4 8 D8 KJFSFKS 9 5 8 D9 KJFSFKS 10 5 10 D10 desired output based on same product category code, if productcategory code is same across multiple rows add amount field, pick the description based on highest orderno, orderno always picklowest, output as below name orderno

Hive drops all the partitions if the partition column name is not correct

阅读更多关于 Hive drops all the partitions if the partition column name is not correct

问题 I am facing a strange issue with hive, I have a table, partitioned on the basis of dept_key (its a integer eg.3212) table is created as follows create external table dept_details (dept_key,dept_name,dept_location) PARTITIONED BY (dept_key_partition INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '~' LOCATION '/dept_details/dept/'; Now I have some partitions already added e.g: 1204,1203,1204 When I tried dropping the partition I by mistake typed only dept_key and not "dept_key_partition" this