Hive | 易学教程

Issue in Hive Query due to memory

阅读更多关于 Issue in Hive Query due to memory

问题 We have insert query in which we are trying to insert data to partitioned table by reading data from non partitioned table. Query - insert into db1.fact_table PARTITION(part_col1, part_col2) ( col1, col2, col3, col4, col5, col6, . . . . . . . col32 LOAD_DT, part_col1, Part_col2 ) select col1, col2, col3, col4, col5, col6, . . . . . . . col32, part_col1, Part_col2 from db1.main_table WHERE col1=0; Table has 34 columns, number of records in main table depends on size of input file which we

Aws Athena - Rename column name

阅读更多关于 Aws Athena - Rename column name

问题 I am trying to change a column name in an AWS Athena table. From old_name to new_name . Normal DDL commands does not affect the table (They cannot be executed). Is It possible to change a column name without deleting and re-creating the table from scratch ? 回答1: I was mistaken, Athena uses HIVE DDL syntax so the correct command is : ALTER TABLE %%table-name%% CHANGE %%old-column-name%% %%new-column-name%%<string>; I based my answer on a hive related question. 回答2: You can find more about

Hive QL selecting numeric substring of string

阅读更多关于 Hive QL selecting numeric substring of string

问题 I have a table with two columns: id, datastring The id column is just a bigint and the datastring column has elements that look like {"12345":[6789,true]} {"1234678":[5678, false]} I would like to select a table where the first column is the id and the second column is the number in the quotes part of the datastring. However, this number is not always the same number of digits. The result should be id, numstring 4321, 12345 4322, 134678 Thanks in advance. 回答1: You have at least two options.

What will happen if Hive number of reducers is different to number of keys?

阅读更多关于 What will happen if Hive number of reducers is different to number of keys?

问题 In Hive I ofter do queries like: select columnA, sum(columnB) from ... group by ... I read some mapreduce example and one reducer can only produce one key. It seems the number of reducers completely depends on number of keys in columnA. Therefore, why could hive set number of reducers manully? If there are 10 different values in columnA and I set number of reducers to 2 , what will happen? Each reducers will be reused 5 times? If there are 10 different values in columnA and I set number of

Sqoop: How to map input column names to different column names in Hive?

阅读更多关于 Sqoop: How to map input column names to different column names in Hive?

问题 Is there any way to do mapping between input column names and output Hive column names in Sqoop command line or Scoop API? For example: Input SQL table: (Name STRING, Phone INT) --> need to map into --> Output Hive table: (ClientName STRING, PhoneNumber INT) I have to do this because Hive does not support Unicode in table schema and can not parse Cyrillic column names. 回答1: You can use a free-form query import (--query option) and say something like --query 'select Name as ClientName, Phone

What will happen if Hive number of reducers is different to number of keys?

阅读更多关于 What will happen if Hive number of reducers is different to number of keys?

Select first row of group with criteria

阅读更多关于 Select first row of group with criteria

问题 I have a table in this format: FieldA FieldB FieldC 1111 ABC X 1111 DEF Y 1111 GHI X 2222 JKL Y 2222 MNO X 3333 PQR U 3333 STT U I want to select one FieldB per FieldA with preference to X in FieldC (if there no X, pick another one). I've tried using the RANK function with PARTITION BY but I find it too inconsistent and I have now reached a wall. My output would look like this: FieldA FieldB FieldC 1111 ABC X 2222 MNO X 3333 PQR U Query: Select rank() over (partition by Field3 order by Field1

Select first row of group with criteria

阅读更多关于 Select first row of group with criteria

spark throws error when reading hive table

阅读更多关于 spark throws error when reading hive table

问题 i am trying to do select * from db.abc in hive,this hive table was loaded using spark it does not work shows an error: Error: java.io.IOException: java.lang.IllegalArgumentException: bucketId out of range: -1 (state=,code=0) when i use the following properties i was able to query for hive: set hive.mapred.mode=nonstrict; set hive.optimize.ppd=true; set hive.optimize.index.filter=true; set hive.tez.bucket.pruning=true; set hive.explain.user=false; set hive.fetch.task.conversion=none; now when

How to get lastaltertimestamp from Hive table?

阅读更多关于 How to get lastaltertimestamp from Hive table?

问题 Teradata has the concept of lastaltertimestamp , which is the last time an alter table command was executed on a table. lastaltertimestamp can be queried. Does Hive have a similar value that can be queried? The timestamp returned by hdfs dfs -ls /my/hive/file does not reflect alter table commands, so alter table must not modify the file backing Hive file. describe formatted does not provide a last-alter-timestamp either. Thanks 回答1: Hive stores metadata into a database, so files never get