hiveql

How to see the date when the table was created?

余生长醉 提交于 2019-12-09 10:56:58
问题 I have created a table couple months ago. Is there any way in HIVE that I can see when was the table created? show table doesn't give the date creation of the table. 回答1: Execute the command desc formatted <database>.<table_name> on the hive cli. It will show detailed table information similar to Detailed Table Information Database: Owner: CreateTime: LastAccessTime: 回答2: You need to run the following command: describe formatted <your_table_name>; Or if you need this information about a

Using Impala get the count of consecutive trips

喜你入骨 提交于 2019-12-08 18:42:34
Sample Data touristid|day ABC|1 ABC|1 ABC|2 ABC|4 ABC|5 ABC|6 ABC|8 ABC|10 The output should be touristid|trip ABC|4 Logic behind 4 is count of consecutive days distinct consecutive days sqq 1,1,2 is 1st then 4,5,6 is 2nd then 8 is 3rd and 10 is 4th I want this output using impala query Get previous day using lag() function, calculate new_trip_flag if the day-prev_day>1, then count(new_trip_flag). Demo: with table1 as ( select 'ABC' as touristid, 1 as day union all select 'ABC' as touristid, 1 as day union all select 'ABC' as touristid, 2 as day union all select 'ABC' as touristid, 4 as day

Load complex json in hive using jsonserde

*爱你&永不变心* 提交于 2019-12-08 14:10:40
问题 I am trying to build a table in hive for following json { "business_id": "vcNAWiLM4dR7D2nwwJ7nCA", "hours": { "Tuesday": { "close": "17:00", "open": "08:00" }, "Friday": { "close": "17:00", "open": "08:00" } }, "open": true, "categories": [ "Doctors", "Health & Medical" ], "review_count": 9, "name": "Eric Goldberg, MD", "neighborhoods": [], "attributes": { "By Appointment Only": true, "Accepts Credit Cards": true, "Good For Groups": 1 }, "type": "business" } I can create a table using

Hive RegexSerDe

删除回忆录丶 提交于 2019-12-08 10:46:58
问题 I need to read data from a flat file. It contains a number of lines but want to extract data from the line that looks like: REVISION 12 30364918 Anarchism 2005-12-06T17:44:47Z RJII 141644 I only want the 2nd, 3rd and 5th entries on this line and put them into a Hive table; I have issued this command but get an error create external table testTable ( tag string, a string, r string ) row format SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES( "input.regex" =

Retrieve 3rd MAX salary in Hive

好久不见. 提交于 2019-12-08 09:06:45
I'm a novice. I have the following Employee table. ID Name Country Salary ManagerID I retrieved the 3rd max salary using the following. select name , salary From ( select name, salary from employee sort by salary desc limit 3) result sort by salary limit 1; How to do the same to display 3rd max salary for each country? can we use OVER ( PARTITION BY country )? I tried looking in the languageManual Windowing and Analytics but I'm finding it difficult to understand. Please help! You're definitely on the right track with windowing functions. row_number() is a good function to use here. select

Retrieve 3rd MAX salary in Hive

你。 提交于 2019-12-08 09:06:43
I'm a novice. I have the following Employee table. ID Name Country Salary ManagerID I retrieved the 3rd max salary using the following. select name , salary From ( select name, salary from employee sort by salary desc limit 3) result sort by salary limit 1; How to do the same to display 3rd max salary for each country? can we use OVER ( PARTITION BY country )? I tried looking in the languageManual Windowing and Analytics but I'm finding it difficult to understand. Please help! You're definitely on the right track with windowing functions. row_number() is a good function to use here. select

Loading more records than actual in HIve

痞子三分冷 提交于 2019-12-08 08:57:50
问题 While inserting from Hive table to HIve table, It is loading more records that actual records. Can anyone help in this weird behaviour of Hive ? My query would be looking like this: insert overwrite table_a select col1,col2,col3,... from table_b; My table_b consists of 6405465 records. After inserting from table_b to table_a, i found total records in table_a are 6406565. Can any one please help here ? 回答1: If hive.compute.query.using.stats=true; then optimizer is using statistics for query

Difference between SAS merge and full outer join [duplicate]

試著忘記壹切 提交于 2019-12-08 08:26:30
问题 This question already has answers here : How to replicate a SAS merge (2 answers) Closed 4 years ago . Table t1: person | visit | code_num1 | code_desc1 1 1 100 OTD 1 2 101 SED 2 3 102 CHM 3 4 103 OTD 3 4 103 OTD 4 5 101 SED Table t2: person | visit | code_num2 | code_desc2 1 1 104 DME 1 6 104 DME 3 4 103 OTD 3 4 103 OTD 3 7 103 OTD 4 5 104 DME I have the following SAS code that merges the two tables t1 and t2 by person and visit: DATA t3; MERGE t1 t2; BY person visit; RUN; Which produces the

Regex SerDe doesn't support the serialize() method error

邮差的信 提交于 2019-12-08 08:22:49
问题 I have a table structure as below. CREATE TABLE db.TEST( f1 string, f2 string, f3 string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( 'input.regex'='(.{2})(.{3})(.{4})' ) STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://nameservice1/location/TEST'; I tried to insert a record into the table as below. insert overwrite table db.TEST2 select '12' as a ,

“Could not get input splits” Error, with Hive-Cassandra-CqlStorageHandler

喜夏-厌秋 提交于 2019-12-08 08:10:08
问题 Im trying to read data from cassandra using Hive with CqlStorageHandler. The versions: Hive 0.11.0 Hadoop 1.2.1 Cassandra 1.2.6 Im able to create EXTERNAL table with the following HIVE Query CREATE EXTERNAL TABLE input(number string,name string,address string) STORED BY 'org.apache.hadoop.hive.cassandra.cql.CqlStorageHandler' WITH SERDEPROPERTIES ("cassandra.columns.mapping" = ":key, name, address", "cassandra.ks.name" ="cassandradb", "cassandra.host" = "localhost" ,"cassandra.port" = "9160")