hiveql | 易学教程

Dynamic partition cannot be the parent of a static partition '3'

阅读更多关于 Dynamic partition cannot be the parent of a static partition '3'

While inserting data into table hive threw the error "Dynamic partition cannot be the parent of a static partition '3'" using below query INSERT INTO TABLE student_partition PARTITION(course , year = 3) SELECT name, id, course FROM student1 WHERE year = 3; Please explain the reason.. The reason of this Exception is because partitions are hierarchical folders. course folder is upper level and year is nested folders for each year. When you creating partitions dynamically, upper folder should be created first (course) then nested year=3 folder. You are providing year=3 partition in advance

Hive - LIKE Operator

阅读更多关于 Hive - LIKE Operator

问题 I can not figure out how I deal with that problem: This is my Data: Table1: Table2: BRAND PRODUCT SOLD Sony Sony ABCD 1233 Apple Sony adv 1233 Google Sony aaaa 1233 IBM Apple 123 1233 etc. Apple 345 1233 IBM 13123 1233 Is it possible to filter the query that I have a table where stands the brand and the total solds? My idea is: Select table1.brand, sum(table2.sold) from table1 join table2 on (table1.brand LIKE '%table2.product%') group by table.1.brand That was my idea, but i always get an

Convert string to timestamp in Hive

阅读更多关于 Convert string to timestamp in Hive

问题 I have the following string representation of a timestamp in my Hive table: 20130502081559999 I need to convert it to a string like so: 2013-05-02 08:15:59 I have tried following ({code} >>> {result}): from_unixtime(unix_timestamp('20130502081559999', 'yyyyMMddHHmmss')) >>> 2013-05-03 00:54:59 from_unixtime(unix_timestamp('20130502081559999', 'yyyyMMddHHmmssMS')) >>> 2013-09-02 08:15:59 from_unixtime(unix_timestamp('20130502081559999', 'yyyyMMddHHmmssMS')) >>> 2013-05-02 08:10:39 Converting

Hive select data into an array of structs

阅读更多关于 Hive select data into an array of structs

I am trying to figure out a way in Hive to select data from a flat source and output into an array of named struct(s). Here is a example of what I am looking for... Sample Data: house_id,first_name,last_name 1,bob,jones 1,jenny,jones 2,sally,johnson 3,john,smith 3,barb,smith Desired Output: 1 [{"first_name":"bob","last_name":"jones"},{"first_name":"jenny","last_name":"jones"}] 2 [{"first_name":"sally","last_name":"johnson"}] 3 [{"first_name":"john","last_name":"smith"},{"first_name":"barb","last_name":"smith"}] I tried collect_list and collect_set but they only allow primitive data types. Any

Find TOP 10 latest record for each BUYER_ID for yesterday's date

阅读更多关于 Find TOP 10 latest record for each BUYER_ID for yesterday's date

This is the below table CREATE TABLE IF NOT EXISTS TestingTable1 ( BUYER_ID BIGINT, ITEM_ID BIGINT, CREATED_TIME STRING ) And this is the below data in the above table- BUYER_ID | ITEM_ID | CREATED_TIME ------------+------------------+----------------------- 1015826235 220003038067 2012-07-09 19:40:21, 1015826235 300003861266 2012-07-09 18:19:59, 1015826235 140002997245 2012-07-09 09:23:17, 1015826235 210002448035 2012-07-09 22:21:11, 1015826235 260003553381 2012-07-09 07:09:56, 1015826235 260003553382 2012-07-09 19:40:39, 1015826235 260003553383 2012-07-09 06:58:47, 1015826235 260003553384

hive-site.xml path in hive0.13.1

阅读更多关于 hive-site.xml path in hive0.13.1

I'm a newbie. I would like to know the hive-site.xml and hive-default.xml files locations in hive-0.13.1 version. I have downloaded hive0.13.1-bin version from the below location. http://apache.mirrors.pair.com/hive/hive-0.13.1/ Extracted and then configured hive environment variables. I'm able to run the commands (create table, show, load data, query table..) . But in the conf(/hive/hive-0.13-1/conf) directory, I do not see hive-site.xml and hive-default.xml files. Where these files are located in hive-0.13.1 version? follow the steps 1) Extract folder 2)go to /apache-hive-0.13.1-bin/conf and

Hive's unix_timestamp and from_unixtime functions

阅读更多关于 Hive's unix_timestamp and from_unixtime functions

I am under the impression that unix_timestamp and from_unixtime Hive functions are 'reverse' of each other. When I try to convert timestamp string to seconds in Hive: SELECT unix_timestamp('10-Jun-15 10.00.00.000000 AM', 'dd-MMM-yy hh.mm.ss.MS a'); I get 1418176800. When I try to convert 1418176800 to timestamp string: SELECT from_unixtime(1418176800, 'dd-MMM-yy hh.mm.ss.MS a'); I get 10-Dec-14 10.00.00.120 AM, which is obviously not equal to the original. Can someone explain what's going on? Thanks. From the language manual: Convert time string with given pattern to Unix time stamp (in

How do I get millisecond precision in hive?

阅读更多关于 How do I get millisecond precision in hive?

The documentation says that timestamps support the following conversion: •Floating point numeric types: Interpreted as UNIX timestamp in seconds with decimal precision First of all, I'm not sure how to interpret this. If I have a timestamp 2013-01-01 12:00:00.423, can I convert this to a numeric type that retains the milliseconds? Because that is what I want. More generally, I need to do comparisons between timestamps such as select maxts - mints as latency from mytable where maxts and mints are timestamp columns. Currently, this gives me NullPointerException using Hive 0.11.0. I am able to

Can you explain when and why mapreduce is invoked in hive

阅读更多关于 Can you explain when and why mapreduce is invoked in hive

select * from Table_name limit 5; select col1_name,col2_name from table_name limit 5; When i run the first query there will be no MapReduce invoked, while for other the MapReduce is invoked. Could you please explain the reason. To understand the reason, first we need to know what map and reduce phases mean:- Map: Basically a filter which filters and organizes data in sorted order. For e.g. It will filter col1_name, col2_name from a row in the second query. However in 1st query you are reading every column, no filtering is required. Hence no Map phase Reduce : Reduce is just summary operation

Hive doesn't support in, exists. How do I write the following query?

阅读更多关于 Hive doesn't support in, exists. How do I write the following query?

I have two tables A and B that both have a column id. I wish to obtain ids from A that are not present in B. The obvious way is: SELECT id FROM A WHERE id NOT IN (SELECT id FROM B) Unfortunately, Hive doesn't support in, exists or subqueries. Is there a way to achieve the above using joins? I thought of the following SELECT A.id FROM A,B WHERE A.id<>B.id But it seems like this will return the entirety of A, since there always exists an id in B that is not equal to any id in A. You can do the same with a LEFT OUTER JOIN in Hive: SELECT A.id FROM A LEFT OUTER JOIN B ON (B.id = A.id) WHERE B.id