Hive query with certain specific exclude conditions

邮差的信 提交于 2019-12-25 08:36:42

问题


I am trying to build a hive query that does only the below features or a combination of these features. For example, the features include

name = "summary"

name = "details"

name1 = "vehicle stats"

name1 = "accelerometer"

I have to count the number of customers who strictly follow the above conditions. For example, in the below table, customer "Joy" should not be counted because he has additionally done "expenses" in name even though he has both "summary" and "details" in name and "vehicle stats" and "accelerometer" in name1.

Similarly, customer "Lan" should not been counted as he has additionally done "speeding" in name1 which is not in the above conditions.

    customername    name        name1
    Joy             summary     vehicle stats
    Joy             details     accelerometer
    Joy             expenses    speeding
    Lan             summary     vehicle stats
    Lan             details     accelerometer   
    Lan             details     speeding
    Hana            details     accelerometer
    Hana            summary     vehicle stats

Count for the below table has to be 1 as there is only 1 customer (Hana) who has done only "summary" and "details" in name and "vehicle stats" and "accelerometer" in name1.

This is the query that I currently have:

    select name, name1, count(distinct(customername))
    from table1
    where date_time between "2017-01-01 00:00:00" and "2017-01-10 00:00:00"
    group by name, name1
    having name in ('summary', 'details') 
    or name1 in ('vehicle stats', 'accelerometer')

Any suggestions would be great!!


回答1:


part 1

select      customername

from        table1

group by    customername

having      count 
            (
                case 
                    when    name  in ('summary', 'details') 
                         or name1 in ('vehicle stats','accelerometer')
                    then    1
                end
            ) > 0

        and count 
            (
                case 
                    when    name  not in ('summary', 'details') 
                         or name1 not in ('vehicle stats','accelerometer')
                    then    1
                end
            ) = 0

+--------------+
| customername |
+--------------+
| Hana         |
+--------------+

Part 2

select      name
           ,name1
           ,count(*)

from       (select      sort_array(collect_set(name))   as name
                       ,sort_array(collect_set(name1))  as name1

            from        table1

            group by    customername

            having      count 
                        (
                            case 
                                when    name  in ('summary', 'details') 
                                     or name1 in ('vehicle stats','accelerometer')
                                then    1
                            end
                        ) > 0

                    and count 
                        (
                            case 
                                when    name  not in ('summary', 'details') 
                                     or name1 not in ('vehicle stats','accelerometer')
                                then    1
                            end
                        ) = 0
            ) t

group by    name
           ,name1

+-----------------------+-----------------------------------+----+
|         name          |               name1               | c2 |
+-----------------------+-----------------------------------+----+
| ["details","summary"] | ["accelerometer","vehicle stats"] |  1 |
+-----------------------+-----------------------------------+----+



回答2:


You can also use collect_set to check only for the specified entries in those columns.

select customername
from table1
where date_time between "2017-01-01 00:00:00" and "2017-01-10 00:00:00"
group by customername
having concat_ws(',',collect_set(name)) = 'summary,details'
and concat_ws(',',collect_set(name1)) = 'vehicle stats,accelerometer'

You have to sort the concatenated output from collect_set for comparison.



来源:https://stackoverflow.com/questions/44286001/hive-query-with-certain-specific-exclude-conditions

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!