hiveql | 易学教程

Delete column in hive table

阅读更多关于 Delete column in hive table

问题 I am working with hive version 0.9 and I need delete columns of a hive table. I have searched in several manuals of hive commands but I only I have found commands to version 0.14. Is possible to delete a column of a hive table in hive version 0.9? What is the command? Thanks. 回答1: We can’t simply drop a table column from a hive table using the below statement like sql. ALTER TABLE tbl_name drop column column_name ---- it will not work. So there is a shortcut to drop columns from a hive table.

Difference between `load data inpath ` and `location` in hive?

阅读更多关于 Difference between `load data inpath ` and `location` in hive?

问题 At my firm, I see these two commands used frequently, and I'd like to be aware of the differences, because their functionality seems the same to me: 1 create table <mytable> (name string, number double); load data inpath '/directory-path/file.csv' into <mytable>; 2 create table <mytable> (name string, number double); location '/directory-path/file.csv'; They both copy the data from the directory on HDFS into the directory for the table on HIVE. Are there differences that one should be aware

HIVE: How to include null rows in lateral view explode

阅读更多关于 HIVE: How to include null rows in lateral view explode

问题 I have a table as follows: user_id email u1 e1, e2 u2 null My goal is to convert this into the following format: user_id email u1 e1 u1 e2 u2 null So for this I am using the lateral view explode() function in Hive, as follows: select * FROM table LATERAL VIEW explode ( split ( email ,',' ) ) email AS email_id But doing this the u2 row is getting skipped as it has null value in email. How can we include the nulls too in the output? Edit: I am using a workaround doing an union of this table

Date Difference less than 15 minutes in Hive

阅读更多关于 Date Difference less than 15 minutes in Hive

Below is my query, in which in the last line I am trying to see if the difference between the dates is within 15 minutes. But whenever I run the below query. SELECT TT.BUYER_ID , COUNT(*) FROM (SELECT testingtable1.buyer_id, testingtable1.item_id, testingtable1.created_time from (select user_id, prod_and_ts.product_id as product_id, prod_and_ts.timestamps as timestamps from testingtable2 LATERAL VIEW explode(purchased_item) exploded_table as prod_and_ts where to_date(from_unixtime(cast(prod_and_ts.timestamps as BIGINT))) = '2012-07-09') prod_and_ts RIGHT OUTER JOIN (SELECT buyer_id, item_id,

Semantic exception error in HIVE while using last_value window function

阅读更多关于 Semantic exception error in HIVE while using last_value window function

I have a table with the following data: dt device id count 2018-10-05 computer 7541185957382 6 2018-10-20 computer 7541185957382 3 2018-10-14 computer 7553187775734 6 2018-10-17 computer 7553187775734 10 2018-10-21 computer 7553187775734 2 2018-10-22 computer 7549187067178 5 2018-10-20 computer 7553187757256 3 2018-10-11 computer 7549187067178 10 I want to get the last and first dt for each id . Hence, I used the window functions first_value and last_value as follows: select id,last_value(dt) over (partition by id order by dt) last_dt from table order by id ; But I am getting this error:

Loading more records than actual in HIve

阅读更多关于 Loading more records than actual in HIve

While inserting from Hive table to HIve table, It is loading more records that actual records. Can anyone help in this weird behaviour of Hive ? My query would be looking like this: insert overwrite table_a select col1,col2,col3,... from table_b; My table_b consists of 6405465 records. After inserting from table_b to table_a, i found total records in table_a are 6406565. Can any one please help here ? If hive.compute.query.using.stats=true; then optimizer is using statistics for query calculation instead of querying table data. This is much faster because metastore is a fast database like

How to generate all n-grams in Hive

阅读更多关于 How to generate all n-grams in Hive

I'd like to create a list of n-grams using HiveQL. My idea was to use a regex with a lookahead and the split function - this does not work, though: select split('This is my sentence', '(\\S+) +(?=(\\S+))'); The input is a column of the form |sentence | |-------------------------| |This is my sentence | |This is another sentence | The output is supposed to be: ["This is","is my","my sentence"] ["This is","is another","another sentence"] There is an n-grams udf in Hive but the function directly calculates the frequency of the n-grams - I'd like to have a list of all the n-grams instead. Thanks a

Hive scanning entire data for bucketed table

阅读更多关于 Hive scanning entire data for bucketed table

问题 I was trying to optimize a hive SQL by bucketing the data on a single column. I created the table with following statement CREATE TABLE `source_bckt`( `uk` string, `data` string) CLUSTERED BY(uk) SORTED BY(uk) INTO 10 BUCKETS Then inserted the data after executing "set hive.enforce.bucketing = true;" When I run the following select "select * from source_bckt where uk='1179724';" Even though the data is supposed to be in a single file which can be identified by the following equation HASH(

Hive shell throws Filenotfound exception while executing queries, inspite of adding jar files using “ADD JAR”

阅读更多关于 Hive shell throws Filenotfound exception while executing queries, inspite of adding jar files using “ADD JAR”

1) I have added serde jar file using "ADD JAR /home/hduser/softwares/hive/hive-serdes-1.0-SNAPSHOT.jar;" 2) Create table 3) The table is creates successfully 4) But when I execute any select query it throws file not found exception hive> select count(*) from tab_tweets; Query ID = hduser_20150604145353_51b4def4-11fb-4638-acac-77301c1c1806 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set

Creating hive partitions for multiple months using one script

阅读更多关于 Creating hive partitions for multiple months using one script

问题 I have data for 4 years. Like '2011 2012 2013 2014' I have to run queries based on one month's data. So i am creating partitions as below. 'ALTER TABLE table1_2010Jan ADD PARTITION(year='2010', month='01', day='01') LOCATION 'path'; ALTER TABLE table1_2010Jan ADD PARTITION(year='2010', month='01', day='02') LOCATION 'path'; ALTER TABLE table1_2010Jan ADD PARTITION(year='2010', month='01', day='03') LOCATION 'path';' I am creating individual partitions like above for every day of every month.