问题
We have two columns Id and month Id.
The output what I'm looking for is to divide year from month Id based on quarter granularity. The activity column should be from quarter. If id is active activity should be 1 else 0 .If id comes in any of the 1st quarter (eg:only 1) the activity is still 1 .
Like this:
id month_dt
-----------------------------------
1000000000 2012-03-01 00:00:00.0
1000000000 2015-09-01 00:00:00.0
1000000000 2016-10-01 00:00:00.0
1000000000 2015-11-01 00:00:00.0
1000000000 2014-01-01 00:00:00.0
1000000000 2013-04-01 00:00:00.0
1000000000 2014-12-01 00:00:00.0
1000000000 2015-02-01 00:00:00.0
1000000000 2014-06-01 00:00:00.0
1000000000 2013-01-01 00:00:00.0
1000000000 2014-05-01 00:00:00.0
1000000000 2016-05-01 00:00:00.0
1000000000 2013-07-01 00:00:00.0
What is expected:
ID YEAR QTR ACTIVITY (1 or 0)
--------------------------------------------------
1000000000 2012 1 1
1000000000 2012 2 0
1000000000 2012 3 0
1000000000 2012 4 0
1000000000 2013 1 1
1000000000 2013 2 1
1000000000 2013 3 1
1000000000 2013 4 0
Below is the one I tried but it doesn't return the expected results. Please help me achieve this
SELECT
a.id, a.year,
SUM(CASE WHEN quarter BETWEEN 1 AND 3 THEN 1 ELSE 0 END) AS Q1,
SUM(CASE WHEN quarter BETWEEN 4 AND 6 THEN 1 ELSE 0 END) AS Q2,
SUM(CASE WHEN quarter BETWEEN 7 AND 9 THEN 1 ELSE 0 END) AS Q3,
SUM(CASE WHEN quarter BETWEEN 10 AND 12 THEN 1 ELSE 0 END) AS Q4
FROM
(SELECT
id,
TRIM(SUBSTRING(month_id, 1, 4)) AS year,
TRIM(regexp_replace(SUBSTR(month_id, 5, 4), "-", "")) as quarter
FROM
test.patientid) a
GROUP BY
a.id, a.year
回答1:
I think you are looking for something like this:
select y.yyyy, q.q,
(case when count(t.month_dt) > 0 then 1 else 0 end) as activity_flag
from (select distinct year(month_dt) as yyyy from t) y cross join
(select distinct quarter(month_dt) as q from t) q left join
t
on year(t.month_dt) = y.yyyy and quarter(t.month_dt) = q.q
group by y.yyyy, q.q;
This assumes that there is at least one activity for each quarter in a year (regardless of the quarter). Otherwise, you just need to put in a list of 1, 2, 3, and 4 to get the quarters.
回答2:
@Babu; If the function quarter does not exist in your version of hive, I have an alternate function for getting the quarter on a give date. Hope this helps. Thanks!
create table qtrs(qtr int);
insert into qtrs values (1),(2),(3),(4);
create table ims
(id int,
month_dt date
);
insert into ims values
(100, '2012-03-01'),
(100, '2013-04-01'),
(100, '2013-01-01'),
(100, '2013-07-01'),
(100, '2014-01-01'),
(100, '2014-05-01'),
(100, '2014-06-01'),
(100, '2014-12-01'),
(100, '2015-02-01'),
(100, '2015-09-01'),
(100, '2015-11-01'),
(100, '2016-05-01'),
(100, '2016-10-01');
insert into ims values
(200, '2012-03-01'),
(200, '2013-04-01');
Query:
select DISTINCT NVL(ims.id, qtr.id) as id,qtr.year as year,qtr.qtr as qtr,
IF(ims.id is null, 0, 1) as activity
from jbacoy.ims ims
right join (select distinct ims.id,YEAR(ims.month_dt) as year,qtrs.qtr from jbacoy.ims ims join jbacoy.qtrs qtrs) qtr
on (ims.id=qtr.id and year(ims.month_dt)=qtr.year and int((month(month_dt)-1)/3)+1=qtr.qtr)
sort by id, year, qtr;
Result:
id year qtr activity
100 2012 1 1
100 2012 2 0
100 2012 3 0
100 2012 4 0
100 2013 1 1
100 2013 2 1
100 2013 3 1
100 2013 4 0
100 2014 1 1
100 2014 2 1
100 2014 3 0
100 2014 4 1
100 2015 1 1
100 2015 2 0
100 2015 3 1
100 2015 4 1
100 2016 1 0
100 2016 2 1
100 2016 3 0
100 2016 4 1
200 2012 1 1
200 2012 2 0
200 2012 3 0
200 2012 4 0
200 2013 1 0
200 2013 2 1
200 2013 3 0
200 2013 4 0
来源:https://stackoverflow.com/questions/48732957/divide-ids-based-on-quarter-and-the-count-either-1-or-0-by-determining-the-quart