hiveql | 易学教程

How to see the date when the table was created?

阅读更多关于 How to see the date when the table was created?

问题 I have created a table couple months ago. Is there any way in HIVE that I can see when was the table created? show table doesn't give the date creation of the table. 回答1: Execute the command desc formatted <database>.<table_name> on the hive cli. It will show detailed table information similar to Detailed Table Information Database: Owner: CreateTime: LastAccessTime: 回答2: You need to run the following command: describe formatted <your_table_name>; Or if you need this information about a

Using Impala get the count of consecutive trips

阅读更多关于 Using Impala get the count of consecutive trips

Sample Data touristid|day ABC|1 ABC|1 ABC|2 ABC|4 ABC|5 ABC|6 ABC|8 ABC|10 The output should be touristid|trip ABC|4 Logic behind 4 is count of consecutive days distinct consecutive days sqq 1,1,2 is 1st then 4,5,6 is 2nd then 8 is 3rd and 10 is 4th I want this output using impala query Get previous day using lag() function, calculate new_trip_flag if the day-prev_day>1, then count(new_trip_flag). Demo: with table1 as ( select 'ABC' as touristid, 1 as day union all select 'ABC' as touristid, 1 as day union all select 'ABC' as touristid, 2 as day union all select 'ABC' as touristid, 4 as day

Load complex json in hive using jsonserde

阅读更多关于 Load complex json in hive using jsonserde

问题 I am trying to build a table in hive for following json { "business_id": "vcNAWiLM4dR7D2nwwJ7nCA", "hours": { "Tuesday": { "close": "17:00", "open": "08:00" }, "Friday": { "close": "17:00", "open": "08:00" } }, "open": true, "categories": [ "Doctors", "Health & Medical" ], "review_count": 9, "name": "Eric Goldberg, MD", "neighborhoods": [], "attributes": { "By Appointment Only": true, "Accepts Credit Cards": true, "Good For Groups": 1 }, "type": "business" } I can create a table using

Hive RegexSerDe

阅读更多关于 Hive RegexSerDe

问题 I need to read data from a flat file. It contains a number of lines but want to extract data from the line that looks like: REVISION 12 30364918 Anarchism 2005-12-06T17:44:47Z RJII 141644 I only want the 2nd, 3rd and 5th entries on this line and put them into a Hive table; I have issued this command but get an error create external table testTable ( tag string, a string, r string ) row format SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES( "input.regex" =

Retrieve 3rd MAX salary in Hive

阅读更多关于 Retrieve 3rd MAX salary in Hive

I'm a novice. I have the following Employee table. ID Name Country Salary ManagerID I retrieved the 3rd max salary using the following. select name , salary From ( select name, salary from employee sort by salary desc limit 3) result sort by salary limit 1; How to do the same to display 3rd max salary for each country? can we use OVER ( PARTITION BY country )? I tried looking in the languageManual Windowing and Analytics but I'm finding it difficult to understand. Please help! You're definitely on the right track with windowing functions. row_number() is a good function to use here. select

Retrieve 3rd MAX salary in Hive

阅读更多关于 Retrieve 3rd MAX salary in Hive

Loading more records than actual in HIve

阅读更多关于 Loading more records than actual in HIve

问题 While inserting from Hive table to HIve table, It is loading more records that actual records. Can anyone help in this weird behaviour of Hive ? My query would be looking like this: insert overwrite table_a select col1,col2,col3,... from table_b; My table_b consists of 6405465 records. After inserting from table_b to table_a, i found total records in table_a are 6406565. Can any one please help here ? 回答1: If hive.compute.query.using.stats=true; then optimizer is using statistics for query

Difference between SAS merge and full outer join [duplicate]

阅读更多关于 Difference between SAS merge and full outer join [duplicate]

问题 This question already has answers here : How to replicate a SAS merge (2 answers) Closed 4 years ago . Table t1: person | visit | code_num1 | code_desc1 1 1 100 OTD 1 2 101 SED 2 3 102 CHM 3 4 103 OTD 3 4 103 OTD 4 5 101 SED Table t2: person | visit | code_num2 | code_desc2 1 1 104 DME 1 6 104 DME 3 4 103 OTD 3 4 103 OTD 3 7 103 OTD 4 5 104 DME I have the following SAS code that merges the two tables t1 and t2 by person and visit: DATA t3; MERGE t1 t2; BY person visit; RUN; Which produces the

Regex SerDe doesn't support the serialize() method error

阅读更多关于 Regex SerDe doesn't support the serialize() method error

问题 I have a table structure as below. CREATE TABLE db.TEST( f1 string, f2 string, f3 string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( 'input.regex'='(.{2})(.{3})(.{4})' ) STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://nameservice1/location/TEST'; I tried to insert a record into the table as below. insert overwrite table db.TEST2 select '12' as a ,

“Could not get input splits” Error, with Hive-Cassandra-CqlStorageHandler

阅读更多关于 “Could not get input splits” Error, with Hive-Cassandra-CqlStorageHandler

问题 Im trying to read data from cassandra using Hive with CqlStorageHandler. The versions: Hive 0.11.0 Hadoop 1.2.1 Cassandra 1.2.6 Im able to create EXTERNAL table with the following HIVE Query CREATE EXTERNAL TABLE input(number string,name string,address string) STORED BY 'org.apache.hadoop.hive.cassandra.cql.CqlStorageHandler' WITH SERDEPROPERTIES ("cassandra.columns.mapping" = ":key, name, address", "cassandra.ks.name" ="cassandradb", "cassandra.host" = "localhost" ,"cassandra.port" = "9160")