问题
I have a table of orders with a column denoting whether it's a buy or a sell, with the rows typically ordered by timestamp. What I'd like to do is operate on groups of consecutive buys, plus their sell. e.g. B B S B S B B S -> (B B S) (B S) (B B S)
Example:
order_action | timestamp
-------------+---------------------
buy | 2013-10-03 13:03:02
buy | 2013-10-08 13:03:02
sell | 2013-10-10 15:58:02
buy | 2013-11-01 09:30:02
buy | 2013-11-01 14:03:02
sell | 2013-11-07 10:34:02
buy | 2013-12-03 15:46:02
sell | 2013-12-09 16:00:03
buy | 2013-12-11 13:02:02
sell | 2013-12-18 15:59:03
I'll be running an aggregation function in the end (the groups are so that I can exclude an entire group based on its sell order), so GROUP BY
or partitioned windows seemed like the right way to go, but I can't figure out how to get this specific grouping.
回答1:
This can be surprisingly simple with count()
as window aggregate function:
SELECT *
,count(order_action = 'sell' OR NULL) OVER (ORDER BY ts DESC) AS grp
FROM orders;
Using ts
instead as timestamp
as column name. Avoid reserved words as identifiers.
count()
only counts non-null values. The expression order_action = 'sell' OR NULL
results in TRUE
for 'sell' and NULL
otherwise. count()
returns a running count with the default frame definition from the start of the frame (the whole table in this case) up to the (last peer of) the current row. The running count of sells groups your rows as requested.
I am ordering descending in the OVER
clause to let each group end at a trailing "sell", not a leading "sell". This results in descending group numbers. But that should not matter, you just needed group numbers.
Duplicate timestamps would be a problem (in any case!).
One way for ascending group numbers: use a custom FRAME definition for the window function:
SELECT *
,count(order_action = 'sell' OR NULL)
OVER (ORDER BY ts ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS grp
FROM orders;
SQL Fiddle demonstrating both.
回答2:
I don't have PostgreSQL, so i tried this over on SQL Fiddle
with sells as (
select
rank() over w grp,
lag(timestamp,1,'2000-01-01') over w sd,
timestamp td
from
orders
where
order_action = 'sell'
window w as (order by timestamp)
)
select
s.grp,
o.order_action,
o.timestamp
from
orders o
join
sells s
on o.timestamp > s.sd
and o.timestamp <= s.td
order by o.timestamp
Let me know if this works for you. This was my first time using PostgreSQL and I like it.
回答3:
You can characterize the groups by counting the number of sell
s at or later than each row. You can do this with a cumulative sum to get the group that can then be used for aggregation. Here is an example:
select min(timestamp), max(timestamp), sum(case when order_action = 'buy' then 1 else 0 end) as buys
from (select o.*,
sum(case when order_action = 'sell' then 1 else 0 end) over
(order by timestamp desc) as grp
from orders o
) o
group by grp
来源:https://stackoverflow.com/questions/24708067/grouping-based-on-sequence-of-rows