Grouping based on sequence of rows

问题

I have a table of orders with a column denoting whether it's a buy or a sell, with the rows typically ordered by timestamp. What I'd like to do is operate on groups of consecutive buys, plus their sell. e.g. B B S B S B B S -> (B B S) (B S) (B B S)

Example:

order_action |      timestamp      
-------------+---------------------
buy          | 2013-10-03 13:03:02
buy          | 2013-10-08 13:03:02
sell         | 2013-10-10 15:58:02
buy          | 2013-11-01 09:30:02
buy          | 2013-11-01 14:03:02
sell         | 2013-11-07 10:34:02
buy          | 2013-12-03 15:46:02
sell         | 2013-12-09 16:00:03
buy          | 2013-12-11 13:02:02
sell         | 2013-12-18 15:59:03

I'll be running an aggregation function in the end (the groups are so that I can exclude an entire group based on its sell order), so GROUP BY or partitioned windows seemed like the right way to go, but I can't figure out how to get this specific grouping.

回答1:

This can be surprisingly simple with count() as window aggregate function:

SELECT *
      ,count(order_action = 'sell' OR NULL) OVER (ORDER BY ts DESC) AS grp
FROM   orders;

Using ts instead as timestamp as column name. Avoid reserved words as identifiers.

count() only counts non-null values. The expression order_action = 'sell' OR NULL results in TRUE for 'sell' and NULL otherwise. count() returns a running count with the default frame definition from the start of the frame (the whole table in this case) up to the (last peer of) the current row. The running count of sells groups your rows as requested.
I am ordering descending in the OVER clause to let each group end at a trailing "sell", not a leading "sell". This results in descending group numbers. But that should not matter, you just needed group numbers.
Duplicate timestamps would be a problem (in any case!).

One way for ascending group numbers: use a custom FRAME definition for the window function:

SELECT *
      ,count(order_action = 'sell' OR NULL)
       OVER (ORDER BY ts ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS grp
FROM   orders;

SQL Fiddle demonstrating both.

回答2:

I don't have PostgreSQL, so i tried this over on SQL Fiddle

with sells as (
  select
    rank() over w grp,
    lag(timestamp,1,'2000-01-01') over w sd,
    timestamp td
  from
    orders
  where
    order_action = 'sell'
  window w as (order by timestamp)
)
select
  s.grp,
  o.order_action,
  o.timestamp
from
  orders o
join
  sells s
    on o.timestamp > s.sd
    and o.timestamp <= s.td
order by o.timestamp

Let me know if this works for you. This was my first time using PostgreSQL and I like it.

回答3:

You can characterize the groups by counting the number of sells at or later than each row. You can do this with a cumulative sum to get the group that can then be used for aggregation. Here is an example:

select min(timestamp), max(timestamp), sum(case when order_action = 'buy' then 1 else 0 end) as buys
from (select o.*,
             sum(case when order_action = 'sell' then 1 else 0 end) over
                 (order by timestamp desc) as grp
      from orders o
     ) o
group by grp

来源：https://stackoverflow.com/questions/24708067/grouping-based-on-sequence-of-rows

标签

sql

postgresql

aggregate-functions

window-functions