Selecting all records created less than 1 second apart

折月煮酒 提交于 2019-12-04 17:51:33

You need to check for existence of a row within a second in both directions.
And you need to exclude the row itself from the test:

SELECT *
FROM   purchase p
WHERE  EXISTS (
   SELECT 1
   FROM  purchase p1
   WHERE p1.created > p.created - interval '1 sec' -- "less than a second"
   AND   p1.created < p.created + interval '1 sec'
   AND   p1.account_id = p.account_id
   AND   p1.price      = p.price
   AND   p1.transaction_id <> p.transaction_id   -- assuming that's the pk
   )
ORDER BY account_id, price, created;         -- optional, for handy output

These WHERE conditions are sargable, which allows the use of an index on created:

WHERE p1.created > p.created - interval '1 sec'
AND   p1.created < p.created + interval '1 sec'

As opposed to:

p1.created - p.created < interval '1 sec'

The latter cannot use an index on created which may slow down the query with big tables. Postgres is forced to test all remaining combinations (after applying other conditions). Depending on the selectivity of the other conditions and the size of the table, this may be irrelevant or a moderate to huge performance drain.
With small to medium tables, tests have shown two sequence scans and a hash semi join for either query.

Index

The perfect index for the case would be a multicolumn index of the form:

CREATE INDEX purchase_foo_idx ON purchase (account_id, price, created)

But combinations of indexes on individual columns work rather well, too (and might have more use cases).

I think you're looking for something like this:

select *
from purchase p1
where exists (
    select transaction_id 
    from purchase p2 
    where p2.created > p1.created
    and p2.created - p1.created < interval '1 second'
    and p2.account_id = p1.account_id
    and p2.price = p1.price)

EDIT: The query could be very heavy on big table. Consider limit it e.g. to one day:

select *
from purchase p1
where 
    p1.created::date = '2014-05-08'
    and exists (
        select transaction_id 
        from purchase p2 
        where p2.created::date = '2014-05-08'
        and p2.created > p1.created
        and p2.created - p1.created < interval '1 second'
        and p2.account_id = p1.account_id
        and p2.price = p1.price)

SQL Fiddle

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!