hiveql

How does Hive 'alter table <table name> concatenate' work?

ε祈祈猫儿з 提交于 2019-12-22 01:30:32
问题 I have n(large) number of small sized orc files which i want to merge into k(small) number of large orc files. This is done using alter table table_name concatenate command in Hive. I want to understand how does Hive implement this. I'm looking to implement this using Spark with any changes if required. Any pointers would be great. 回答1: As per the AlterTable/PartitionConcatenate: If the table or partition contains many small RCFiles or ORC files, then the above command will merge them into

Hive query performance is slow when using Hive date functions instead of hardcoded date strings?

徘徊边缘 提交于 2019-12-21 20:54:08
问题 I have a transaction table table_A that gets updated every day. Every day I insert new data into table_A from external table_B using the file_date field to filter the necessary data from external table_B to insert into table_A . However, there's a huge performance difference if I use a hardcoded date vs. using the Hive date functions: -- Fast version (~20 minutes) SET date_ingest = '2016-12-07'; SET hive.exec.dynamic.partition.mode = nonstrict; SET hive.exec.dynamic.partition = TRUE; INSERT

What is the replacement of NULLIF in Hive?

自闭症网瘾萝莉.ら 提交于 2019-12-21 18:32:07
问题 I would like to know what is the replacement of NULLIF in Hive? I am using COALESCE but its not serving my requirement. My query statement is something like : COALESCE(A,B,C) AS D COALESCE will return first NOT NULL value. But my A/B/C contain blank values so COALESCE is not assigning that value to D as it is considering blank as NOT NULL. But I want the correct value to be get assign to D. In SQL I could have use COALESCE(NULLIF(A,'')......) so it will check for blank as well. I tried CASE

How do you insert data into complex data type “Struct” in Hive

与世无争的帅哥 提交于 2019-12-21 16:58:09
问题 I'm completely new to Hive and Stack Overflow. I'm trying to create a table with complex data type "STRUCT" and then populate it using INSERT INTO TABLE in Hive. I'm using the following code: CREATE TABLE struct_test ( address STRUCT< houseno: STRING ,streetname: STRING ,town: STRING ,postcode: STRING > ); INSERT INTO TABLE struct_test SELECT NAMED_STRUCT('123', 'GoldStreet', London', W1a9JF') AS address FROM dummy_table LIMIT 1; I get the following error: Error while compiling statement:

Difference between 'Stored as InputFormat, OutputFormat' and 'Stored as' in Hive

家住魔仙堡 提交于 2019-12-21 10:19:09
问题 Issue when executing a show create table and then executing the resulting create table statement if the table is ORC. Using show create table , you get this: STORED AS INPUTFORMAT ‘org.apache.hadoop.hive.ql.io.orc.OrcInputFormat’ OUTPUTFORMAT ‘org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat’ But if you create the table with those clauses, you will then get the casting error when selecting. Error likes: Failed with exception java.io.IOException:java.lang.ClassCastException: org.apache.hadoop

Is Hive's collect_list ordered?

做~自己de王妃 提交于 2019-12-21 05:39:05
问题 This page says of collect_list: Returns a list of objects with duplicates. Is that list ordered? For example, the order of the query results? 回答1: built-in collect_list isn't guaranteed to be ordered, even if you do an order by first (even if it did ensure order, doing it this way is a waste of time). Just use brickhouse collect; it ensures the elements are ordered. 回答2: It's correct that collect_list isn't guaranteed to be ordered. The function sort_array will sort the result: select a, b,

Unable to connect to HIVE2 via JAVA

感情迁移 提交于 2019-12-21 05:13:20
问题 Referring to Hive2 created a simple java program to connect to HIVE2 server (not local) have added all mentioned jars in the above link in the class path in eclipse as well however when I run the code it throws an error as: 09:42:35,580 INFO Utils:285 - Supplied authorities: hdstg-c01-edge-03:20000 09:42:35,583 INFO Utils:372 - Resolved authority: hdstg-c01-edge-03:20000 09:42:35,656 INFO HiveConnection:189 - Will try to open client transport with JDBC Uri: jdbc:hive2://hdstg-c01-edge-03

Is LIMIT clause in HIVE really random?

落花浮王杯 提交于 2019-12-21 04:25:15
问题 The documentation of HIVE notes that LIMIT clause returns rows chosen at random . I have been running a SELECT table on a table with more than 800,000 records with LIMIT 1 , but it always return me the same record. I'm using the Shark distribution, and I am wondering whether this has got anything to do with this not expected behavior? Any thoughts would be appreciated. Thanks, Visakh 回答1: Even though the documentation states it returns rows at random, it's not actually true. It returns

Add a column in a table in HIVE QL

Deadly 提交于 2019-12-20 08:57:44
问题 I'm writing a code in HIVE to create a table consisting of 1300 rows and 6 columns: create table test1 as SELECT cd_screen_function, SUM(access_count) AS max_count, MIN(response_time_min) as response_time_min, AVG(response_time_avg) as response_time_avg, MAX(response_time_max) as response_time_max, SUM(response_time_tot) as response_time_tot, COUNT(*) as row_count FROM sheet WHERE ts_update BETWEEN unix_timestamp('2012-11-01 00:00:00') AND unix_timestamp('2012-11-30 00:00:00') and cd_office =

SparkR from Rstudio - gives Error in invokeJava(isStatic = TRUE, className, methodName, …) :

戏子无情 提交于 2019-12-20 05:00:10
问题 I am using RStudio. After creating session if i try to create dataframe using R data it gives error. Sys.setenv(SPARK_HOME = "E:/spark-2.0.0-bin-hadoop2.7/spark-2.0.0-bin-hadoop2.7") Sys.setenv(HADOOP_HOME = "E:/winutils") .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths())) Sys.setenv('SPARKR_SUBMIT_ARGS'='"sparkr-shell"') library(SparkR) sparkR.session(sparkConfig = list(spark.sql.warehouse.dir="C:/Temp")) localDF <- data.frame(name=c("John", "Smith", "Sarah"), age=c