问题
I am trying to build a hive query that does only the below features or a combination of these features. For example, the features include
name = "summary"
name = "details"
name1 = "vehicle stats"
name1 = "accelerometer"
I have to count the number of customers who strictly follow the above conditions. For example, in the below table, customer "Joy" should not be counted because he has additionally done "expenses" in name even though he has both "summary" and "details" in name and "vehicle stats" and "accelerometer" in name1.
Similarly, customer "Lan" should not been counted as he has additionally done "speeding" in name1 which is not in the above conditions.
customername name name1
Joy summary vehicle stats
Joy details accelerometer
Joy expenses speeding
Lan summary vehicle stats
Lan details accelerometer
Lan details speeding
Hana details accelerometer
Hana summary vehicle stats
Count for the below table has to be 1 as there is only 1 customer (Hana) who has done only "summary" and "details" in name and "vehicle stats" and "accelerometer" in name1.
This is the query that I currently have:
select name, name1, count(distinct(customername))
from table1
where date_time between "2017-01-01 00:00:00" and "2017-01-10 00:00:00"
group by name, name1
having name in ('summary', 'details')
or name1 in ('vehicle stats', 'accelerometer')
Any suggestions would be great!!
回答1:
part 1
select customername
from table1
group by customername
having count
(
case
when name in ('summary', 'details')
or name1 in ('vehicle stats','accelerometer')
then 1
end
) > 0
and count
(
case
when name not in ('summary', 'details')
or name1 not in ('vehicle stats','accelerometer')
then 1
end
) = 0
+--------------+
| customername |
+--------------+
| Hana |
+--------------+
Part 2
select name
,name1
,count(*)
from (select sort_array(collect_set(name)) as name
,sort_array(collect_set(name1)) as name1
from table1
group by customername
having count
(
case
when name in ('summary', 'details')
or name1 in ('vehicle stats','accelerometer')
then 1
end
) > 0
and count
(
case
when name not in ('summary', 'details')
or name1 not in ('vehicle stats','accelerometer')
then 1
end
) = 0
) t
group by name
,name1
+-----------------------+-----------------------------------+----+
| name | name1 | c2 |
+-----------------------+-----------------------------------+----+
| ["details","summary"] | ["accelerometer","vehicle stats"] | 1 |
+-----------------------+-----------------------------------+----+
回答2:
You can also use collect_set
to check only for the specified entries in those columns.
select customername
from table1
where date_time between "2017-01-01 00:00:00" and "2017-01-10 00:00:00"
group by customername
having concat_ws(',',collect_set(name)) = 'summary,details'
and concat_ws(',',collect_set(name1)) = 'vehicle stats,accelerometer'
You have to sort the concatenated output from collect_set
for comparison.
来源:https://stackoverflow.com/questions/44286001/hive-query-with-certain-specific-exclude-conditions