Select finishes where athlete didn't finish first for the past 3 events

馋奶兔 提交于 2019-12-02 02:16:55
Erwin Brandstetter

I think this can be even simpler / faster:

SELECT day, place, athlete
FROM  (
   SELECT *, min(place) OVER (PARTITION BY athlete
                              ORDER BY day
                              ROWS 3 PRECEDING) AS best
   FROM   t
   ) sub
WHERE  best > 1

->SQLfiddle

Uses the aggregate function min() as window function to get the minimum place of the last three rows plus the current one.
The then trivial check for "no win" (best > 1) has to be done on the next query level since window functions are applied after the WHERE clause. So you need at least one CTE of sub-select for a condition on the result of a window function.

Details about window function calls in the manual here. In particular:

If frame_end is omitted it defaults to CURRENT ROW.

If place (finishing_pos) can be NULL, use this instead:

WHERE  best IS DISTINCT FROM 1

min() ignores NULL values, but if all rows in the frame are NULL, the result is NULL.

Don't use type names and reserved words as identifiers, I substituted day for your date.

This assumes at most 1 competition per day, else you have to define how to deal with peers in the time line or use timestamp instead of date.

@Craig already mentioned the index to make this fast.

Here's an alternative formulation that does the work in two scans without subqueries:

SELECT
  "date", athlete, place
FROM (
  SELECT 
    "date",
    place,
    athlete,
    1 <> ALL (array_agg(place) OVER w) AS include_row
  FROM Table1
  WINDOW w AS (PARTITION BY athlete ORDER BY "date" ASC ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
) AS history
WHERE include_row;

See: http://sqlfiddle.com/#!1/fa3a4/34

The logic here is pretty much a literal translation of the question. Get the last four placements - current and the previous 3 - and return any rows in which the athlete didn't finish first in any of them.

Because the window frame is the only place where the number of rows of history to consider is defined, you can parameterise this variant unlike my previous effort (obsolete, http://sqlfiddle.com/#!1/fa3a4/31), so it works for the last n for any n. It's also a lot more efficient than the last try.

I'd be really interested in the relative efficiency of this vs @Andomar's query when executed on a dataset of non-trivial size. They're pretty much exactly the same on this tiny dataset. An index on Table1(athlete, "date") would be required for this to perform optimally on a large data set.

; with  CTE as
        (
        select  row_number() over (partition by athlete order by date) rn
        ,       *
        from    Table1
        )
select  *
from    CTE cur
where   not exists
        (
        select  *
        from    CTE prev
        where   prev.place = 1
                and prev.athlete = cur.athlete
                and prev.rn between cur.rn - 3 and cur.rn
        )

Live example at SQL Fiddle.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!