How to alter Hive partition column name

六月ゝ 毕业季﹏ 提交于 2019-12-10 01:58:56

问题


I have to change the partition column name (not partition spec), I looked for the commands in hive wiki and some google pages. I can find the options for altering the partition spec,

i.e. For example

In /table/country='US' I can change US to USA, but I want to change country to continent.

I feel like the only option available for changing partition column name is dropping and re-creating the table. Is there is any other option available please help me.

Thanks in advance.


回答1:


You can change column name in metadata by following: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ChangeColumnName/Type/Position/Comment

But as the document says, it only changes the metadata. Hive partitions are implemented as directories with the naming pattern columnName=spec. So you also need to change the names of those directories on HDFS by using "hadoop fs" command.




回答2:


You have alter the partition column using simple swap method.

  • Create a new temp table which is same schema as current table.
  • Move all files in the old table to newly create table location.

    hadoop fs -mv <current_table_name> <temp_table_name>

  • Alter the schema of the original table (Rename or drop the partitions)
  • Recopy/load the temp table data to the original table with appropriate partition values.

    hadoop fs -mv <temp_table_name> <current_table_name>

  • msck repair the the original table & drop the temp_table.

NOTE : mv command move the file from one location to another with reducing the copy time. alternately we can use LOAD DATA INPATH for copy the data to the original table.




回答3:


You can not change the partition column in hive infact Hive does not support alterting of partitioning columns

You can think of it this way - Hive stores the data by creating a folder in hdfs with partition column values - Since if you trying to alter the hive partition it means you are trying to change the whole directory structure and data of hive table which is not possible exp if you have partitioned on year this is how directory structure looks like

tab1/clientdata/**2009**/file2
tab1/clientdata/**2010**/file3

If you want to change the partition column you can perform below steps

Create another hive table with required changes in partition column

Create table new_table ( A int, B String.....)

Load data from previous table

Insert into new_table partition ( B ) select A,B from table Prev_table



回答4:


As you said, rename the value for of the partition is very straightforward:

hive> ALTER TABLE test.usage PARTITION (country ='US') RENAME TO PARTITION (date='USA');

I know that this is not what you are looking for. Unfortunately, given that your data is already partitioned by country, the only option you have is to drop the table, remove the data (supposing your table is external) from the HDFS and reinsert the data using continent as partition.

What I would do in your case is to have multiple partition levels, so that your folder structure will look like that:

/path/to/the/data/continent='america'/country='usa'
/path/to/the/data/continent='america'/country='mexico'
/path/to/the/data/continent='europe'/country='spain'
/path/to/the/data/continent='europe'/country='italy'
...

That way you can query the data for different levels of granularity (in this case continent and country).




回答5:


Adding solution here for later:

  • Use case: Change partition column from STRING to INT

    set hive.mapred.mode=norestrict; 
    alter table {table_name} partition column ({column_name} {column_type}); 
    
    e.g. ALTER TABLE employee PARTITION COLUMN dept INT;
    


来源:https://stackoverflow.com/questions/32111036/how-to-alter-hive-partition-column-name

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!