hiveql | 易学教程

SMB join not working over Hive Tables

阅读更多关于 SMB join not working over Hive Tables

问题 While performing SMB join over two ORC tables, bucketed and sorted on subscription_id, the join fails giving below error: Error: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:210) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred

Removing special characters using Hive

阅读更多关于 Removing special characters using Hive

问题 I have data stored in Cassandra 1.2 as shown below. There is special character under sValue - highlighted as bold. How can I use hive function to remove this ? Date | Timestam | payload_Timestamp | actDate | actHour | actMinute | sDesc | sName | sValue ---------------------------------+--------------------------------------+--------------------------+----------------------+----------------------+------------------------+---------------------------+--------------------------------+------------

How to find the sum of value based on Adjustments in Impala query

阅读更多关于 How to find the sum of value based on Adjustments in Impala query

问题 I have an Impala table named REV having wire_code, amount and Reporting line for each wire code. +---------+------+----------------+ |wire_code| amt | Reporting_line | +---------+------+----------------+ | abc | 100 | Database | +---------+------+----------------+ | abc | 10 | Revenue | +---------+------+----------------+ | def | 50 | Database | +---------+------+----------------+ | def | 25 | Polland | +---------+------+----------------+ | ghi | 250 | Cost | +---------+------+---------------

Date variable in Hive

阅读更多关于 Date variable in Hive

问题 I am using following code to set date in Hive SET DATE_DM2=date_sub(from_unixtime(unix_timestamp(),'yyyy/MM/dd'), cast(((from_unixtime(unix_timestamp(), 'u') % 7)+1) as int)); But When I am running the following select statement I am not getting the output select * from TableName where partitiondate='${DATE_DM2}'; Is there anything wrong with the syntax ? 回答1: Correct Syntax is : select * from TableName where partitiondate='${hiveconf:DATE_DM2}'; 来源： https://stackoverflow.com/questions

Creating a hive table with ~40K columns

阅读更多关于 Creating a hive table with ~40K columns

问题 I'm trying to create a fairly large table. ~3 millions rows and ~40K columns using hive. To begin, I'm creating an empty table and inserting the data into the table. However, I hit an error when trying this. Unable to acquire IMPLICIT, SHARED lock default after 100 attempts. FAILED: Error in acquiring locks: Locks on the underlying objects cannot be acquire. retry after some time The query is pretty straightforward: create external table database.dataset ( var1 decimal(10,2), var2 decimal(10

How to compute the intersections and unions of two arrays in Hive?

阅读更多关于 How to compute the intersections and unions of two arrays in Hive?

问题 For example, the intersection select intersect(array("A","B"), array("B","C")) should return ["B"] and the union select union(array("A","B"), array("B","C")) should return ["A","B","C"] What's the best way to make this in Hive? I have checked the hive documentation, but cannot find any relevant information to do this. 回答1: Your problem solution is here. Go to the githubLink, there is lot of udfs are created by klout . Download, crate the JAR and add the JAR in the hive. Example CREATE

Hive - multiple (average) count distincts over layered groups

阅读更多关于 Hive - multiple (average) count distincts over layered groups

问题 Given the following source data (say the table name is user_activity ): +---------+-----------+------------+ | user_id | user_type | some_date | +---------+-----------+------------+ | 1 | a | 2018-01-01 | | 1 | a | 2018-01-02 | | 2 | a | 2018-01-01 | | 3 | a | 2018-01-01 | | 4 | b | 2018-01-01 | | 4 | b | 2018-01-02 | | 5 | b | 2018-01-02 | +---------+-----------+------------+ I'd like to get the following result: +-----------+------------+---------------------+ | user_type | user_count |

convert normal column as partition column in hive

阅读更多关于 convert normal column as partition column in hive

问题 I have a table with 3 columns. now i need to modify one of the column as a partition column. Is there any possibility? If not, how can we add partition to existing table. I used the below syntax: create table t1 (eno int, ename string ) row format delimited fields terminated by '\t'; load data local '/....path/' into table t1; alter table t1 add partition (p1='india'); i am getting errors......... Any one know how to add partition to existing table ......? Thanks in advance. 回答1: I don't

How can I export view data in hive?

阅读更多关于 How can I export view data in hive?

问题 I have created 4 tables (a,b,c,d) in hive and created a view (x) on top of that tables by joining them. -- How can i export the x underlying csv data from hdfs to local ? -- How can i keep this csv in hdfs for tables , we can do show create table a ; this will show the location of the hdfs where the underlying csv is stored. hadoop fs get --from source_path_and_file --to dest_path_and_file similarly how can i get the csv data from view into my local. 回答1: You can export view data to the CSV

How to check whether a partition exists with hive

阅读更多关于 How to check whether a partition exists with hive

问题 I have a HiveQL script that can do some operations based on a hive table. But before doing these operations, I will check whether the partition needed exists, and if not, I will terminate the script. So how can I achieve it? 回答1: Using shell: table_name="schema.table" partition_spec="key=value" partition_exists=$(hive -e "show partitions $table_name" | grep "$partition_spec"); #check partition_exists if [ "$partition_exists" = "" ]; then echo not exists; else echo exists; fi 来源： https:/