hiveql

Delete column in hive table

≡放荡痞女 提交于 2019-12-07 02:02:44
问题 I am working with hive version 0.9 and I need delete columns of a hive table. I have searched in several manuals of hive commands but I only I have found commands to version 0.14. Is possible to delete a column of a hive table in hive version 0.9? What is the command? Thanks. 回答1: We can’t simply drop a table column from a hive table using the below statement like sql. ALTER TABLE tbl_name drop column column_name ---- it will not work. So there is a shortcut to drop columns from a hive table.

Difference between `load data inpath ` and `location` in hive?

◇◆丶佛笑我妖孽 提交于 2019-12-07 01:37:21
问题 At my firm, I see these two commands used frequently, and I'd like to be aware of the differences, because their functionality seems the same to me: 1 create table <mytable> (name string, number double); load data inpath '/directory-path/file.csv' into <mytable>; 2 create table <mytable> (name string, number double); location '/directory-path/file.csv'; They both copy the data from the directory on HDFS into the directory for the table on HIVE. Are there differences that one should be aware

HIVE: How to include null rows in lateral view explode

扶醉桌前 提交于 2019-12-07 01:27:33
问题 I have a table as follows: user_id email u1 e1, e2 u2 null My goal is to convert this into the following format: user_id email u1 e1 u1 e2 u2 null So for this I am using the lateral view explode() function in Hive, as follows: select * FROM table LATERAL VIEW explode ( split ( email ,',' ) ) email AS email_id But doing this the u2 row is getting skipped as it has null value in email. How can we include the nulls too in the output? Edit: I am using a workaround doing an union of this table

Date Difference less than 15 minutes in Hive

吃可爱长大的小学妹 提交于 2019-12-06 16:31:56
Below is my query, in which in the last line I am trying to see if the difference between the dates is within 15 minutes. But whenever I run the below query. SELECT TT.BUYER_ID , COUNT(*) FROM (SELECT testingtable1.buyer_id, testingtable1.item_id, testingtable1.created_time from (select user_id, prod_and_ts.product_id as product_id, prod_and_ts.timestamps as timestamps from testingtable2 LATERAL VIEW explode(purchased_item) exploded_table as prod_and_ts where to_date(from_unixtime(cast(prod_and_ts.timestamps as BIGINT))) = '2012-07-09') prod_and_ts RIGHT OUTER JOIN (SELECT buyer_id, item_id,

Semantic exception error in HIVE while using last_value window function

拥有回忆 提交于 2019-12-06 15:50:51
I have a table with the following data: dt device id count 2018-10-05 computer 7541185957382 6 2018-10-20 computer 7541185957382 3 2018-10-14 computer 7553187775734 6 2018-10-17 computer 7553187775734 10 2018-10-21 computer 7553187775734 2 2018-10-22 computer 7549187067178 5 2018-10-20 computer 7553187757256 3 2018-10-11 computer 7549187067178 10 I want to get the last and first dt for each id . Hence, I used the window functions first_value and last_value as follows: select id,last_value(dt) over (partition by id order by dt) last_dt from table order by id ; But I am getting this error:

Loading more records than actual in HIve

我只是一个虾纸丫 提交于 2019-12-06 15:31:47
While inserting from Hive table to HIve table, It is loading more records that actual records. Can anyone help in this weird behaviour of Hive ? My query would be looking like this: insert overwrite table_a select col1,col2,col3,... from table_b; My table_b consists of 6405465 records. After inserting from table_b to table_a, i found total records in table_a are 6406565. Can any one please help here ? If hive.compute.query.using.stats=true; then optimizer is using statistics for query calculation instead of querying table data. This is much faster because metastore is a fast database like

How to generate all n-grams in Hive

拜拜、爱过 提交于 2019-12-06 15:19:53
I'd like to create a list of n-grams using HiveQL. My idea was to use a regex with a lookahead and the split function - this does not work, though: select split('This is my sentence', '(\\S+) +(?=(\\S+))'); The input is a column of the form |sentence | |-------------------------| |This is my sentence | |This is another sentence | The output is supposed to be: ["This is","is my","my sentence"] ["This is","is another","another sentence"] There is an n-grams udf in Hive but the function directly calculates the frequency of the n-grams - I'd like to have a list of all the n-grams instead. Thanks a

Hive scanning entire data for bucketed table

时间秒杀一切 提交于 2019-12-06 12:47:45
问题 I was trying to optimize a hive SQL by bucketing the data on a single column. I created the table with following statement CREATE TABLE `source_bckt`( `uk` string, `data` string) CLUSTERED BY(uk) SORTED BY(uk) INTO 10 BUCKETS Then inserted the data after executing "set hive.enforce.bucketing = true;" When I run the following select "select * from source_bckt where uk='1179724';" Even though the data is supposed to be in a single file which can be identified by the following equation HASH(

Hive shell throws Filenotfound exception while executing queries, inspite of adding jar files using “ADD JAR”

只谈情不闲聊 提交于 2019-12-06 10:55:21
1) I have added serde jar file using "ADD JAR /home/hduser/softwares/hive/hive-serdes-1.0-SNAPSHOT.jar;" 2) Create table 3) The table is creates successfully 4) But when I execute any select query it throws file not found exception hive> select count(*) from tab_tweets; Query ID = hduser_20150604145353_51b4def4-11fb-4638-acac-77301c1c1806 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set

Creating hive partitions for multiple months using one script

你离开我真会死。 提交于 2019-12-06 10:53:15
问题 I have data for 4 years. Like '2011 2012 2013 2014' I have to run queries based on one month's data. So i am creating partitions as below. 'ALTER TABLE table1_2010Jan ADD PARTITION(year='2010', month='01', day='01') LOCATION 'path'; ALTER TABLE table1_2010Jan ADD PARTITION(year='2010', month='01', day='02') LOCATION 'path'; ALTER TABLE table1_2010Jan ADD PARTITION(year='2010', month='01', day='03') LOCATION 'path';' I am creating individual partitions like above for every day of every month.