hiveql | 易学教程

In Hive, how to combine multiple tables to produce single row containing array of objects?

阅读更多关于 In Hive, how to combine multiple tables to produce single row containing array of objects?

问题 I have two tables as follows: users table ========================== | user_id name age | |========================= | 1 pete 20 | | 2 sam 21 | | 3 nash 22 | ========================== hobbies table ====================================== | user_id hobby time_spent | |===================================== | 1 football 2 | | 1 running 1 | | 1 basketball 3 | ====================================== First question: I would like to make a single Hive query that can return rows in this format: {

Hive: Merging Configuration Settings not working

阅读更多关于 Hive: Merging Configuration Settings not working

问题 On Hive 2.2.0, I am filling an orc table from another source table of size 1.34 GB, using the query INSERT INTO TABLE TableOrc SELECT * FROM Table; ---- (1) The query creates TableORC table with 6 orc files, which are much smaller than the block size of 256MB. -- FolderList1 -rwxr-xr-x user1 supergroup 65.01 MB 1/1/2016, 10:14:21 AM 1 256 MB 000000_0 -rwxr-xr-x user1 supergroup 67.48 MB 1/1/2016, 10:14:55 AM 1 256 MB 000001_0 -rwxr-xr-x user1 supergroup 66.3 MB 1/1/2016, 10:15:18 AM 1 256 MB

How to make dynamic insert in hive from a field?

阅读更多关于 How to make dynamic insert in hive from a field?

问题 I have a column where I have several dates, as follows: Sun Oct 22 05:35:03 2017 Mon Apr 16 14:33:43 2018 Fri Apr 13 10:41:43 2018 I've created a process to filter these dates and convert to YYYYMMDD , as below. 20171022 20180416 20180413 This result will be used to distribute the data in their respective partitions, which are daily. I tried to do it this way but I did not succeed: insert into table tab2 PARTITION (REFERENCE_DATE = from_unixtime (unix_timestamp ('Sun Oct 22 05:35:03 2017', 'E

Count Frequency of words of a Text variable with Hive

阅读更多关于 Count Frequency of words of a Text variable with Hive

问题 I have a variable that every row is a sentence. Example: -Row1 "Hey, how are you? -Rwo2 "Hey, Who is there? I want that the output is the count group by word. Example: Hey 2 How 1 are 1 ... I am using split a bit funtion but I am a bit stuck. Any thoughts on this? Thanks! 回答1: This is possible in Hive. Split by non-alpha characters and use lateral view+explode, then count words: with your_data as( select stack(2, 'Hey, how are you?', 'Hey, Who is there?' ) as initial_string ) select w.word,

Look up a list of values in the ranges (bins) as defined by two columns in another table and get the corresponding value from the third column

阅读更多关于 Look up a list of values in the ranges (bins) as defined by two columns in another table and get the corresponding value from the third column

问题 Hello I have two tables T1 and T2. T1 has a column of integer values. And T2 has ranges defined by two columns and a corresponding value for each range... Something like this: range_min range_max corr_value 5 10 1020 11 15 5000 Suppose I want to be able to get the "value" from T2 for each integer of T1 depending on which range the integer value falls into. Say, I have 6, 7, and 12 in T1. Then, the ideal result would look like this: integer_val corr_value 6 1020 7 1020 12 5000 Note that I don

Add missing monthly rows

阅读更多关于 Add missing monthly rows

问题 I would like to list the missing date between two dates in a request for example My data : YEAR_MONTH | AMOUNT 202001 | 500 202001 | 600 201912 | 100 201910 | 200 201910 | 100 201909 | 400 201601 | 5000 I want the request to return 201912 | 100 201911 | 0 201910 | 300 201909 | 400 201908 | 0 201907 | 0 201906 | 0 .... | 0 201712 | 0 i want the last 24 months from the date of execution I did something similar with the dates but not YEAR MONTH yyyyMM select date_sub(s.date_order ,nvl(d.i,0)) as

How to move Hive data table to MySql?

阅读更多关于 How to move Hive data table to MySql?

问题 I would like to know how I can move date from Hive to MySQL? I have seen example on how to move hive data to Amazon DynamoDB but not for a RDBMS like MySQL. Here is the example that I saw with DynamoDB: CREATE EXTERNAL TABLE tbl1 ( name string, location string ) STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' TBLPROPERTIES ("dynamodb.table.name" = "table", "dynamodb.column.mapping" = "name:name,location:location") ; I would like to do the same but with MySQL instead. I

Refresh one hive table from another hive table

阅读更多关于 Refresh one hive table from another hive table

问题 I have a few Hive tables that i am bringing in from RDBMS using Sqoop incremental imports every hour and staging them. I am joining these tables and creating new dimension tables. Whenever i bring in new rows from RDBMS into Hive staging tables, I have to refresh the dimension tables. If there are no new rows, the refresh of dim tables should not be done. The hive version I'm using does not have ACID features. Need some advice on how this could be achieved in hive. 回答1: You can INSERT new

when creating Hive table against csv saved in S3, do I absolutely have to order fields in the order of comma separated values for rows in csv?

阅读更多关于 when creating Hive table against csv saved in S3, do I absolutely have to order fields in the order of comma separated values for rows in csv?

问题 when creating Hive table against csv saved in S3, do I absolutely have to order fields in the order of comma separated values for rows in csv? the csv has the first row as header. I understand that csv is row based not columnar, but was wondering if there is a way to match the value of the header with the field name of the hive table and order columns differently. 回答1: Yes, columns in the table definition (DDL) should be in the same order as in the underlying csv files. You can skip header

Using Impala get the count of consecutive trips

阅读更多关于 Using Impala get the count of consecutive trips

问题 Sample Data touristid|day ABC|1 ABC|1 ABC|2 ABC|4 ABC|5 ABC|6 ABC|8 ABC|10 The output should be touristid|trip ABC|4 Logic behind 4 is count of consecutive days distinct consecutive days sqq 1,1,2 is 1st then 4,5,6 is 2nd then 8 is 3rd and 10 is 4th I want this output using impala query 回答1: Get previous day using lag() function, calculate new_trip_flag if the day-prev_day>1, then count(new_trip_flag). Demo: with table1 as ( select 'ABC' as touristid, 1 as day union all select 'ABC' as