Hive | 易学教程

Drop all partitions from a hive table?

阅读更多关于 Drop all partitions from a hive table?

问题 How can I drop all partitions currently loaded in a Hive table? I can drop a single partition with alter table <table> drop partition(a=, b=...); I can load all partitions with the recover partitions statement. But I cannot seem to drop all partitions. I'm using the latest Hive version supported by EMR, 0.8.1. 回答1: As of version 0.9.0 you can use comparators in the drop partition statement which may be used to drop all partitions at once. An example, taken from the drop_partitions_filter.q

pyspark dataframe withColumn command not working

阅读更多关于 pyspark dataframe withColumn command not working

问题 I have a input dataframe: df_input ( updated df_input ) |comment|inp_col|inp_val| |11 |a |a1 | |12 |a |a2 | |15 |b |b3 | |16 |b |b4 | |17 |c |&b | |17 |c |c5 | |17 |d |&c | |17 |d |d6 | |17 |e |&d | |17 |e |e7 | I want to replace the variable in inp_val column to its value. I have tried with the below code to create a new column. Taken the list of values which starts with '&' df_new = df_inp.select(inp_val).where(df.inp_val.substr(0, 1) == '&') Now I'm iterating over the list to replace the '

pyspark dataframe withColumn command not working

阅读更多关于 pyspark dataframe withColumn command not working

pyspark dataframe withColumn command not working

阅读更多关于 pyspark dataframe withColumn command not working

load text to Orc file

阅读更多关于 load text to Orc file

问题 How to load text file into Hive orc external table? create table MyDB.TEST ( Col1 String, Col2 String, Col3 String, Col4 String) STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'; I have already created above table as Orc. but while fetching data from table it show below error Failed with exception java.io.IOException:org.apache.orc.FileFormatException: Malformed ORC file hdfs://localhost:9000/Ext/sqooporc

AWS EMR Error : All slaves in the job flow were terminated

阅读更多关于 AWS EMR Error : All slaves in the job flow were terminated

问题 I am using Elastic Mapreduce infrastructure on Amazon AWS. A jowflow got terminated automatically. Last state change reason according Amazon Console is : "All slaves in the job flow were terminated". Create jobflow command : elastic-mapreduce --create --name MyCluster --alive --instance-group master --instance-type m1.xlarge --instance-count 1 --bid-price 2.0 --instance-group core --instance-type m1.xlarge --instance-count 10 --bid-price 2.0 --hive-interactive --enable-debugging Details about

How to keep Column Names in camel case in hive

阅读更多关于 How to keep Column Names in camel case in hive

问题 select '12345' as `EmpId'; -- output is empid with value 12345 Any leads to keep the same columnname as EmpId? 回答1: Not possible. This is a limitation of the HIVE metastore. It stores the schema of a table in all lowercase. Hive uses this method to normalize column names, see Table.java private static String normalize(String colName) throws HiveException { if (!MetaStoreServerUtils.validateColumnName(colName)) { throw new HiveException("Invalid column name '" + colName + "' in the table

How to read stream of structured data and write to Hive table

阅读更多关于 How to read stream of structured data and write to Hive table

问题 There is a need to read the stream of structured data from Kafka stream and write it to the already existing Hive table. Upon analysis, it appears that one of the options is to do readStream of Kafka source and then do writeStream to a File sink in HDFS file path. My question here is- is it possible to directly write to a Hive table? Or, Is there a workaround approach that can be followed for this use-case? EDIT1: .foreachBatch - seems to be working but it is having the issue mentioned below

Hive: modify external table's location take too long

阅读更多关于 Hive: modify external table's location take too long

问题 Hive has two kinds of tables which are Managed and External Tables, for the difference, you can check Managed. VS External Tables. Currently, to move external database from HDFS to Alluxio , I need to modify external table's location to alluxio:// . The statement is something like: alter table catalog_page set location "alluxio://node1:19998/user/root/tpcds/1000/catalog_returns" According to my understanding, it should be a simple metastore modification,however, for some tables modification,

partitions in hive interview questions

阅读更多关于 partitions in hive interview questions

问题 1) If the partitioned column doesn't have data, so when you query on that, what error will you get? 2)If some rows doesn't have the partitioned column , the how those rows will be handled? will there be any data loss? 3)Why bucketing needs to be done with numeric column? Can we use string column also? what is the process and on what basis you will choose the bucketing column? 4) Will the internal table details will also be stored in the metastore? Or only external table details will be stored