Hive

Presto query error on hive ORC, Can not read SQL type real from ORC stream of type DOUBLE

ぃ、小莉子 提交于 2021-01-29 01:51:50
问题 I was able to run query in presto to read the non-float columns from Hive ORC(snappy) table. However, when I select all float datatype columns through the presto cli, gives me the below error message. Any suggestions what is the alternative other than changing the filed type to double in the targetHive table presto:sample> select * from emp_detail; Query 20200107_112537_00009_2zpay failed: Error opening Hive split hdfs://ip_address/warehouse/tablespace/managed/hive/sample.db/emp_detail/part

Presto query error on hive ORC, Can not read SQL type real from ORC stream of type DOUBLE

谁都会走 提交于 2021-01-29 01:43:44
问题 I was able to run query in presto to read the non-float columns from Hive ORC(snappy) table. However, when I select all float datatype columns through the presto cli, gives me the below error message. Any suggestions what is the alternative other than changing the filed type to double in the targetHive table presto:sample> select * from emp_detail; Query 20200107_112537_00009_2zpay failed: Error opening Hive split hdfs://ip_address/warehouse/tablespace/managed/hive/sample.db/emp_detail/part

How to declare and use variable in Hive SQL?

瘦欲@ 提交于 2021-01-28 19:55:29
问题 I am using below syntax to declare and use variable in hive sql query. But it gives me an error as below SET aa='10'; SELECT col1 as data, ${aa} as myVar from myTable; ERROR: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: Cannot modify aa at runtime. It is not in list of params that are allowed to be modified at runtime I have also tried using hiveconf SELECT ${hiveconf:aa} from myTable; 回答1: You can not pass variable like that. You need to use --hivevar . You

Hive table re-create before load every date

前提是你 提交于 2021-01-28 14:30:56
问题 I saw application are droping external table and creating again then loading the data and runnning msck command every time data load..what is the benefit of this on every time dropping and creating? 回答1: There is no benefit in dropping and recreating EXTERNAL table, because dropping table leaves data intact. Though there may be a benefit in dropping and re-creating MANAGED table because it will drop data as well. One possible scenario if you are running on S3: Dropping files early before the

Optimize Hive Query. java.lang.OutOfMemoryError: Java heap space/GC overhead limit exceeded

我怕爱的太早我们不能终老 提交于 2021-01-28 14:18:32
问题 How can I optimize a query of this form since I keep running into this OOM error? Or come up with a better execution plan? If I removed the substring clause, the query would work fine, suggesting that this takes a lot of memory. When the job fails, the beeline output shows the OOM Java heap space. Readings online suggested that I increase export HADOOP_HEAPSIZE but this still results in the error. Another thing I tried was increasing the hive.tez.container.size and hive.tez.java.opts (tez

Merging multiple arrays into a map

谁说我不能喝 提交于 2021-01-28 14:08:30
问题 I have some data (sample from full table) that looks like this: | prov_id | hotel_id | m_id | apis_xml | company_id | yyyy_mm_dd | |---------|----------|------|----------|------------|------------| | 945 | 78888 | 3910 | [5] | 998 | 2020-05-20 | | 1475 | 78888 | 6676 | [1,2,4] | 37 | 2020-05-20 | | 1475 | 78888 | 6670 | [1,2,4] | 37 | 2020-05-20 | | 945 | 78888 | 2617 | [5] | 998 | 2020-05-20 | I want to find the lowest apis_xml value per hotel and have the associated prov_id set as the

How to force MR execution when running simple Hive query?

断了今生、忘了曾经 提交于 2021-01-28 13:31:18
问题 There is Hive 2.1.1 over MR, table test_table stored as sequencefile and the following ad-hoc query: select t.* from test_table t where t.test_column = 100 Although this query can be executed without starting MR (fetch task), sometimes it takes longer to scan HDFS files rather than triggering a single map job. When I want to enforce MR execution, I make the query more complex: e.g., using distinct . The significant drawbacks of this approach are: Query results may differ from the original

In Hive, how to read through NULL / empty tags present within an XML using explode(XPATH(..)) function?

杀马特。学长 韩版系。学妹 提交于 2021-01-28 11:51:40
问题 In below Hive-query, I need to read the null / empty "string" tags as well, from the XML content. Only the non-null "string" tags are getting considered within the XPATH() list now. with your_data as ( select '<ParentArray> <ParentFieldArray> <Name>ABCD</Name> <Value> <string>111</string> <string></string> <string>222</string> </Value> </ParentFieldArray> <ParentFieldArray> <Name>EFGH</Name> <Value> <string/> <string>444</string> <string></string> <string>555</string> </Value> <

Hive: Populate other columns based on unique value in a particular column

十年热恋 提交于 2021-01-28 09:40:54
问题 I have a two tables in hive as mentioned below in Hive Table 1: id name value 1 abc stack 3 abc overflow 4 abc foo 6 abc bar Table 2: id name value 5 xyz overflow 9 xyz stackoverflow 3 xyz foo 23 xyz bar I need to take the count of value column without considering the id and name column. Expected output is id name value 1 abc stack 9 xyz stackoverflow I tried this and works in other databases but not in hive select id,name,value from (SELECT id,name,value FROM table1 UNION ALL SELECT id,name

How to undo ALTER TABLE … ADD PARTITION without deleting data

时光怂恿深爱的人放手 提交于 2021-01-28 07:07:30
问题 Let's suppose I have two hive tables, table_1 and table_2 . I use: ALTER TABLE table_2 ADD PARTITION (col=val) LOCATION [table_1_location] Now, table_2 will have the data in table_1 at the partition where col = val . What I want to do is reverse this process. I want table_2 not to have the partition at col=val , and I want table_1 to keep its original data. How can I do this? 回答1: Make your table EXTERNAL first: ALTER TABLE table_2 SET TBLPROPERTIES('EXTERNAL'='TRUE'); Then drop partition,