Hive

Issue in Hive Query due to memory

主宰稳场 提交于 2021-01-28 07:01:38
问题 We have insert query in which we are trying to insert data to partitioned table by reading data from non partitioned table. Query - insert into db1.fact_table PARTITION(part_col1, part_col2) ( col1, col2, col3, col4, col5, col6, . . . . . . . col32 LOAD_DT, part_col1, Part_col2 ) select col1, col2, col3, col4, col5, col6, . . . . . . . col32, part_col1, Part_col2 from db1.main_table WHERE col1=0; Table has 34 columns, number of records in main table depends on size of input file which we

Aws Athena - Rename column name

纵然是瞬间 提交于 2021-01-28 04:20:37
问题 I am trying to change a column name in an AWS Athena table. From old_name to new_name . Normal DDL commands does not affect the table (They cannot be executed). Is It possible to change a column name without deleting and re-creating the table from scratch ? 回答1: I was mistaken, Athena uses HIVE DDL syntax so the correct command is : ALTER TABLE %%table-name%% CHANGE %%old-column-name%% %%new-column-name%%<string>; I based my answer on a hive related question. 回答2: You can find more about

Hive QL selecting numeric substring of string

我的未来我决定 提交于 2021-01-28 04:10:50
问题 I have a table with two columns: id, datastring The id column is just a bigint and the datastring column has elements that look like {"12345":[6789,true]} {"1234678":[5678, false]} I would like to select a table where the first column is the id and the second column is the number in the quotes part of the datastring. However, this number is not always the same number of digits. The result should be id, numstring 4321, 12345 4322, 134678 Thanks in advance. 回答1: You have at least two options.

What will happen if Hive number of reducers is different to number of keys?

吃可爱长大的小学妹 提交于 2021-01-28 03:27:16
问题 In Hive I ofter do queries like: select columnA, sum(columnB) from ... group by ... I read some mapreduce example and one reducer can only produce one key. It seems the number of reducers completely depends on number of keys in columnA. Therefore, why could hive set number of reducers manully? If there are 10 different values in columnA and I set number of reducers to 2 , what will happen? Each reducers will be reused 5 times? If there are 10 different values in columnA and I set number of

Sqoop: How to map input column names to different column names in Hive?

 ̄綄美尐妖づ 提交于 2021-01-28 03:16:41
问题 Is there any way to do mapping between input column names and output Hive column names in Sqoop command line or Scoop API? For example: Input SQL table: (Name STRING, Phone INT) --> need to map into --> Output Hive table: (ClientName STRING, PhoneNumber INT) I have to do this because Hive does not support Unicode in table schema and can not parse Cyrillic column names. 回答1: You can use a free-form query import (--query option) and say something like --query 'select Name as ClientName, Phone

What will happen if Hive number of reducers is different to number of keys?

孤街浪徒 提交于 2021-01-28 02:14:24
问题 In Hive I ofter do queries like: select columnA, sum(columnB) from ... group by ... I read some mapreduce example and one reducer can only produce one key. It seems the number of reducers completely depends on number of keys in columnA. Therefore, why could hive set number of reducers manully? If there are 10 different values in columnA and I set number of reducers to 2 , what will happen? Each reducers will be reused 5 times? If there are 10 different values in columnA and I set number of

Select first row of group with criteria

自闭症网瘾萝莉.ら 提交于 2021-01-27 22:07:16
问题 I have a table in this format: FieldA FieldB FieldC 1111 ABC X 1111 DEF Y 1111 GHI X 2222 JKL Y 2222 MNO X 3333 PQR U 3333 STT U I want to select one FieldB per FieldA with preference to X in FieldC (if there no X, pick another one). I've tried using the RANK function with PARTITION BY but I find it too inconsistent and I have now reached a wall. My output would look like this: FieldA FieldB FieldC 1111 ABC X 2222 MNO X 3333 PQR U Query: Select rank() over (partition by Field3 order by Field1

Select first row of group with criteria

与世无争的帅哥 提交于 2021-01-27 19:11:07
问题 I have a table in this format: FieldA FieldB FieldC 1111 ABC X 1111 DEF Y 1111 GHI X 2222 JKL Y 2222 MNO X 3333 PQR U 3333 STT U I want to select one FieldB per FieldA with preference to X in FieldC (if there no X, pick another one). I've tried using the RANK function with PARTITION BY but I find it too inconsistent and I have now reached a wall. My output would look like this: FieldA FieldB FieldC 1111 ABC X 2222 MNO X 3333 PQR U Query: Select rank() over (partition by Field3 order by Field1

spark throws error when reading hive table

谁说胖子不能爱 提交于 2021-01-27 13:56:17
问题 i am trying to do select * from db.abc in hive,this hive table was loaded using spark it does not work shows an error: Error: java.io.IOException: java.lang.IllegalArgumentException: bucketId out of range: -1 (state=,code=0) when i use the following properties i was able to query for hive: set hive.mapred.mode=nonstrict; set hive.optimize.ppd=true; set hive.optimize.index.filter=true; set hive.tez.bucket.pruning=true; set hive.explain.user=false; set hive.fetch.task.conversion=none; now when

How to get lastaltertimestamp from Hive table?

烈酒焚心 提交于 2021-01-27 13:56:04
问题 Teradata has the concept of lastaltertimestamp , which is the last time an alter table command was executed on a table. lastaltertimestamp can be queried. Does Hive have a similar value that can be queried? The timestamp returned by hdfs dfs -ls /my/hive/file does not reflect alter table commands, so alter table must not modify the file backing Hive file. describe formatted does not provide a last-alter-timestamp either. Thanks 回答1: Hive stores metadata into a database, so files never get