How to partition a Hive Table using range of values for a column

霸气de小男生 提交于 2019-12-19 09:08:47

问题


I have a Hive Table with 2 columns.Employee ID and Salary.

Data is something like given below.

Employee ID Salary
1   10000.08
2   20078.67
3   20056.45
4   30000.76
5   10045.14
6   43567.76

I want to create Partitions based on Salary Column.For Example Partition for salary range 10000 to 20000, 20001 to 30000.

How do i achieve this.


回答1:


Hive does not support range partitioning, but you can calculate ranges during data load.

  1. Create table partitioned by salary_range:

    create table your_table
    (
     employee_id bigint,
     salary double
    )
    partitioned by (salary_range bigint)
    
  2. insert using case for salary range calculation:

    insert overwrite table your_table partition (salary_range)   
    select employee_id, salary,  
           case 
               when salary between 10000 and 20000 then 20000
               when salary between 20001 and 30000 then 30000 
               ...
               else ...
           end as salary_range 
    from some_table;
    


来源:https://stackoverflow.com/questions/45398092/how-to-partition-a-hive-table-using-range-of-values-for-a-column

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!