Hive

Hive/Impala performance with string partition key vs Integer partition key

梦想的初衷 提交于 2021-02-07 19:53:16
问题 Are numeric columns recommended for partition keys? Will there be any performance difference when we do a select query on numeric column partitions vs string column partitions? 回答1: No, there is no such recommendation. Consider this: The thing is that partition representation in Hive is a folder with a name like 'key=value' or it can be just 'value' but anyway it is string folder name. So it is being stored as string and is being cast during read/write. Partition key value is not packed

Select if table exists in Apache Hive

坚强是说给别人听的谎言 提交于 2021-02-07 14:49:24
问题 I have a hive query which is of the format, select . . . from table1 left join (select . . . from table2) on (some_condition) The table2 might not be present depending on the environment. So I would like to join if only table2 is present otherwise just ignore the subquery. The below query returns the table_name if it exists, show tables in {DB_NAME} like '{table_name}' But I dont know how I can integrate this into my query to select only if it exists. Is there a way in hive query to check if

Select if table exists in Apache Hive

穿精又带淫゛_ 提交于 2021-02-07 14:48:19
问题 I have a hive query which is of the format, select . . . from table1 left join (select . . . from table2) on (some_condition) The table2 might not be present depending on the environment. So I would like to join if only table2 is present otherwise just ignore the subquery. The below query returns the table_name if it exists, show tables in {DB_NAME} like '{table_name}' But I dont know how I can integrate this into my query to select only if it exists. Is there a way in hive query to check if

Impala: Show tables like query

南楼画角 提交于 2021-02-07 14:45:47
问题 I am working with Impala and fetching the list of tables from the database with some pattern like below. Assume i have a Database bank , and tables under this database are like below. cust_profile cust_quarter1_transaction cust_quarter2_transaction product_cust_xyz .... .... etc Now i am filtering like show tables in bank like '*cust*' It is returning the expected results like, which are the tables has a word cust in its name. Now my requirement is i want all the tables which will have cust

ParseException in Hive

六眼飞鱼酱① 提交于 2021-02-07 14:20:37
问题 I am trying to use a UDF in hive. But when I try to create a temporary function using userdate as 'unixtimeToDate' , I get this exception hive> create temporary function userdate1 as 'unixtimeToDate'; FAILED: ParseException line 1:25 character ' ' not supported here line 1:35 character ' ' not supported here I am not sure why the character is not supported. Could I get some guidance on this please. 回答1: The exception is clear enough here, you have an error in your SQL. You have a full width

Simple User/Password authentication for HiveServer2 (without Kerberos/LDAP)

蹲街弑〆低调 提交于 2021-02-07 12:51:41
问题 How to provide a simple propertyfile or database user/password authentication for HiveServer2? I already found this presentation about this, but it's not in English :(. On the Cloudera reference manual they talk about the hive.server2.authentication property. It supports CUSTOM implementations of the interface hive.server2.custom.authentication . How to implement that? 回答1: In essence, you have to provide a java application that can perform your authentication. Maybe you're authing to a mysql

Simple User/Password authentication for HiveServer2 (without Kerberos/LDAP)

安稳与你 提交于 2021-02-07 12:51:03
问题 How to provide a simple propertyfile or database user/password authentication for HiveServer2? I already found this presentation about this, but it's not in English :(. On the Cloudera reference manual they talk about the hive.server2.authentication property. It supports CUSTOM implementations of the interface hive.server2.custom.authentication . How to implement that? 回答1: In essence, you have to provide a java application that can perform your authentication. Maybe you're authing to a mysql

Spark + Hive : Number of partitions scanned exceeds limit (=4000)

有些话、适合烂在心里 提交于 2021-02-07 11:03:50
问题 We upgraded our Hadoop Platform (Spark; 2.3.0, Hive: 3.1), and I'm facing this exception when reading some Hive tables in Spark : "Number of partitions scanned on table 'my_table' exceeds limit (=4000)". Tables we are working on : table1 : external table with a total of ~12300 partitions, partitioned by(col1: String, date1: String) , (ORC compressed ZLIB) table2 : external table with a total of 4585 partitions, partitioned by(col21: String, date2: Date, col22: String) (ORC uncompressed) [A]

Spark SQL to Hive table - Datetime Field Hours Bug

孤街浪徒 提交于 2021-02-07 10:42:14
问题 I face this problem: When I enter in a timestamp field in Hive with spark.sql data, the hours are strangely changed to 21:00:00! Let me explain: I have a csv file that I read with spark.sql. I read the file, convert it to dataframe and store it, in a Hive table. One of the fields in this file is date in the format "3/10/2017". The field in Hive that I want to enter it, is in Timestamp format (the reason I use this data type instead of Date is that I want to query table with Impala and Impala

Spark SQL to Hive table - Datetime Field Hours Bug

怎甘沉沦 提交于 2021-02-07 10:41:39
问题 I face this problem: When I enter in a timestamp field in Hive with spark.sql data, the hours are strangely changed to 21:00:00! Let me explain: I have a csv file that I read with spark.sql. I read the file, convert it to dataframe and store it, in a Hive table. One of the fields in this file is date in the format "3/10/2017". The field in Hive that I want to enter it, is in Timestamp format (the reason I use this data type instead of Date is that I want to query table with Impala and Impala