How to alter Hive partition column name

笑着哭i 提交于 2019-12-05 01:59:40

You can change column name in metadata by following: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ChangeColumnName/Type/Position/Comment

But as the document says, it only changes the metadata. Hive partitions are implemented as directories with the naming pattern columnName=spec. So you also need to change the names of those directories on HDFS by using "hadoop fs" command.

You have alter the partition column using simple swap method.

  • Create a new temp table which is same schema as current table.
  • Move all files in the old table to newly create table location.

    hadoop fs -mv <current_table_name> <temp_table_name>

  • Alter the schema of the original table (Rename or drop the partitions)
  • Recopy/load the temp table data to the original table with appropriate partition values.

    hadoop fs -mv <temp_table_name> <current_table_name>

  • msck repair the the original table & drop the temp_table.

NOTE : mv command move the file from one location to another with reducing the copy time. alternately we can use LOAD DATA INPATH for copy the data to the original table.

You can not change the partition column in hive infact Hive does not support alterting of partitioning columns

You can think of it this way - Hive stores the data by creating a folder in hdfs with partition column values - Since if you trying to alter the hive partition it means you are trying to change the whole directory structure and data of hive table which is not possible exp if you have partitioned on year this is how directory structure looks like

tab1/clientdata/**2009**/file2
tab1/clientdata/**2010**/file3

If you want to change the partition column you can perform below steps

Create another hive table with required changes in partition column

Create table new_table ( A int, B String.....)

Load data from previous table

Insert into new_table partition ( B ) select A,B from table Prev_table

As you said, rename the value for of the partition is very straightforward:

hive> ALTER TABLE test.usage PARTITION (country ='US') RENAME TO PARTITION (date='USA');

I know that this is not what you are looking for. Unfortunately, given that your data is already partitioned by country, the only option you have is to drop the table, remove the data (supposing your table is external) from the HDFS and reinsert the data using continent as partition.

What I would do in your case is to have multiple partition levels, so that your folder structure will look like that:

/path/to/the/data/continent='america'/country='usa'
/path/to/the/data/continent='america'/country='mexico'
/path/to/the/data/continent='europe'/country='spain'
/path/to/the/data/continent='europe'/country='italy'
...

That way you can query the data for different levels of granularity (in this case continent and country).

Adding solution here for later:

  • Use case: Change partition column from STRING to INT

    set hive.mapred.mode=norestrict; 
    alter table {table_name} partition column ({column_name} {column_type}); 
    
    e.g. ALTER TABLE employee PARTITION COLUMN dept INT;
    
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!