Hive

Drop all partitions from a hive table?

蓝咒 提交于 2020-07-17 09:48:25
问题 How can I drop all partitions currently loaded in a Hive table? I can drop a single partition with alter table <table> drop partition(a=, b=...); I can load all partitions with the recover partitions statement. But I cannot seem to drop all partitions. I'm using the latest Hive version supported by EMR, 0.8.1. 回答1: As of version 0.9.0 you can use comparators in the drop partition statement which may be used to drop all partitions at once. An example, taken from the drop_partitions_filter.q

pyspark dataframe withColumn command not working

杀马特。学长 韩版系。学妹 提交于 2020-07-15 09:22:32
问题 I have a input dataframe: df_input ( updated df_input ) |comment|inp_col|inp_val| |11 |a |a1 | |12 |a |a2 | |15 |b |b3 | |16 |b |b4 | |17 |c |&b | |17 |c |c5 | |17 |d |&c | |17 |d |d6 | |17 |e |&d | |17 |e |e7 | I want to replace the variable in inp_val column to its value. I have tried with the below code to create a new column. Taken the list of values which starts with '&' df_new = df_inp.select(inp_val).where(df.inp_val.substr(0, 1) == '&') Now I'm iterating over the list to replace the '

pyspark dataframe withColumn command not working

倖福魔咒の 提交于 2020-07-15 09:22:05
问题 I have a input dataframe: df_input ( updated df_input ) |comment|inp_col|inp_val| |11 |a |a1 | |12 |a |a2 | |15 |b |b3 | |16 |b |b4 | |17 |c |&b | |17 |c |c5 | |17 |d |&c | |17 |d |d6 | |17 |e |&d | |17 |e |e7 | I want to replace the variable in inp_val column to its value. I have tried with the below code to create a new column. Taken the list of values which starts with '&' df_new = df_inp.select(inp_val).where(df.inp_val.substr(0, 1) == '&') Now I'm iterating over the list to replace the '

pyspark dataframe withColumn command not working

孤人 提交于 2020-07-15 09:22:05
问题 I have a input dataframe: df_input ( updated df_input ) |comment|inp_col|inp_val| |11 |a |a1 | |12 |a |a2 | |15 |b |b3 | |16 |b |b4 | |17 |c |&b | |17 |c |c5 | |17 |d |&c | |17 |d |d6 | |17 |e |&d | |17 |e |e7 | I want to replace the variable in inp_val column to its value. I have tried with the below code to create a new column. Taken the list of values which starts with '&' df_new = df_inp.select(inp_val).where(df.inp_val.substr(0, 1) == '&') Now I'm iterating over the list to replace the '

load text to Orc file

江枫思渺然 提交于 2020-07-10 09:00:05
问题 How to load text file into Hive orc external table? create table MyDB.TEST ( Col1 String, Col2 String, Col3 String, Col4 String) STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'; I have already created above table as Orc. but while fetching data from table it show below error Failed with exception java.io.IOException:org.apache.orc.FileFormatException: Malformed ORC file hdfs://localhost:9000/Ext/sqooporc

AWS EMR Error : All slaves in the job flow were terminated

折月煮酒 提交于 2020-07-10 06:37:33
问题 I am using Elastic Mapreduce infrastructure on Amazon AWS. A jowflow got terminated automatically. Last state change reason according Amazon Console is : "All slaves in the job flow were terminated". Create jobflow command : elastic-mapreduce --create --name MyCluster --alive --instance-group master --instance-type m1.xlarge --instance-count 1 --bid-price 2.0 --instance-group core --instance-type m1.xlarge --instance-count 10 --bid-price 2.0 --hive-interactive --enable-debugging Details about

How to keep Column Names in camel case in hive

无人久伴 提交于 2020-07-09 05:02:20
问题 select '12345' as `EmpId'; -- output is empid with value 12345 Any leads to keep the same columnname as EmpId? 回答1: Not possible. This is a limitation of the HIVE metastore. It stores the schema of a table in all lowercase. Hive uses this method to normalize column names, see Table.java private static String normalize(String colName) throws HiveException { if (!MetaStoreServerUtils.validateColumnName(colName)) { throw new HiveException("Invalid column name '" + colName + "' in the table

How to read stream of structured data and write to Hive table

六眼飞鱼酱① 提交于 2020-07-07 11:25:27
问题 There is a need to read the stream of structured data from Kafka stream and write it to the already existing Hive table. Upon analysis, it appears that one of the options is to do readStream of Kafka source and then do writeStream to a File sink in HDFS file path. My question here is- is it possible to directly write to a Hive table? Or, Is there a workaround approach that can be followed for this use-case? EDIT1: .foreachBatch - seems to be working but it is having the issue mentioned below

Hive: modify external table's location take too long

送分小仙女□ 提交于 2020-07-07 05:38:09
问题 Hive has two kinds of tables which are Managed and External Tables, for the difference, you can check Managed. VS External Tables. Currently, to move external database from HDFS to Alluxio , I need to modify external table's location to alluxio:// . The statement is something like: alter table catalog_page set location "alluxio://node1:19998/user/root/tpcds/1000/catalog_returns" According to my understanding, it should be a simple metastore modification,however, for some tables modification,

partitions in hive interview questions

删除回忆录丶 提交于 2020-07-05 11:09:10
问题 1) If the partitioned column doesn't have data, so when you query on that, what error will you get? 2)If some rows doesn't have the partitioned column , the how those rows will be handled? will there be any data loss? 3)Why bucketing needs to be done with numeric column? Can we use string column also? what is the process and on what basis you will choose the bucketing column? 4) Will the internal table details will also be stored in the metastore? Or only external table details will be stored