问题
I have the following dataset:
id | date | state
-----------------------
1 | 01/01/17 | high
1 | 02/01/17 | high
1 | 03/01/17 | high
1 | 04/01/17 | miss
1 | 05/01/17 | high
2 | 01/01/17 | miss
2 | 02/01/17 | high
2 | 03/01/17 | high
2 | 04/01/17 | miss
2 | 05/01/17 | miss
2 | 06/01/17 | high
I want to create a column rank_state which ranks, within groups of id, the entries as per increasing date (starting from rank 0) which do not have the state of "miss". Furthermore, the rank repeats itself if the entry has a state of "miss". The output should look like:
id | date | state | rank_state
------------------------------------
1 | 01/01/17 | high | 0
1 | 02/01/17 | high | 1
1 | 03/01/17 | high | 2
1 | 04/01/17 | miss | 2
1 | 05/01/17 | high | 3
2 | 01/01/17 | miss | 0
2 | 02/01/17 | high | 0
2 | 03/01/17 | high | 1
2 | 04/01/17 | miss | 1
2 | 05/01/17 | miss | 1
2 | 06/01/17 | high | 2
For example, the 4th row has a rank of 2 since it's state is "miss", i.e. it repeats the rank of row 3 (the same applies to rows 9 and 10). Please note that rows 6 and 7 should have rank 0.
I have tried the following:
,(case when state is not in ('miss') then (rank() over (partition by id order by date desc) - 1) end) as state_rank
and
,rank() over (partition by id order by case when state is not in ('miss') then date end) as state_rank
but neither give me the desired result. Any ideas would be very helpful.
回答1:
More than likely you want:
SELECT *,
GREATEST(
COUNT(case when state != 'miss' then 1 else null end)
OVER(PARTITION BY id ORDER BY date) - 1,
0
) as "state_rank"
FROM tbl;
SQL Fiddle
Basically:
- make your window frame (partition) over
id - only count the ones that aren't
'miss' - because it could be a negative number if starting the record, you can slap on the
GREATESTto use 0 (preventing negatives)
回答2:
Just add frame_clause to vol7ron's answer since Redshift requires it :
select *
, GREATEST(COUNT(case when state != 'miss' then 1 else null end)
OVER(PARTITION BY id order by date rows between unbounded preceding and current row) -1 , 0 ) as state_rank
from tbl;
来源:https://stackoverflow.com/questions/43164608/rank-rows-in-a-column-under-conditions-on-a-different-column