SQL Select Inner join one by one

橙三吉。 提交于 2019-12-11 12:32:14

问题


I have a specific request to do on my database (PostgreSQL v9.4.5), and I don't see any elegant solution in pure SQL to solve it (I know I can do it using Python or other, but I have several billions lines of data, and the calculation time would be greatly increased).

I have two tables : trades and events. These tables both represent the trades occurring in an orderbook during a day (this is why I have several billions lines, my data is over several years) but there are many more events than trades.

Both tables have columns time, volume and quantity, however each one has other columns (let's say respectively foo and bar) with specific information. I want to make a correspondence between the two tables on the columns time, volume and price, as I know this correspondence exists as an injection from trades to events (if there are n rows in trades with the same time t, the same price p and the same volume v, I know there are also n rows in events with the time t, the price p and the volume v).

Trades :

  id |   time    |  price  | volume |   foo
-----+-----------+---------+--------+-------
 201 | 32400.524 |      53 |   2085 |   xxx
 202 | 32400.530 |      53 |   1162 |   xxx
 203 | 32400.531 |   52.99 |     50 |   xxx
 204 | 32400.532 |   52.91 |   3119 |   xxx
 205 | 32400.837 |   52.91 |   3119 |   xxx <--
 206 | 32400.837 |   52.91 |   3119 |   xxx <--
 207 | 32400.837 |   52.91 |   3119 |   xxx <--
 208 | 32400.839 |   52.92 |   3220 |   xxx <--
 209 | 32400.839 |   52.92 |   3220 |   xxx <--
 210 | 32400.839 |   52.92 |   3220 |   xxx <--

Events :

  id |   time    |  price  | volume |  bar 
-----+-----------+---------+--------+------
 328 | 32400.835 |   52.91 |   3119 |  yyy
 329 | 32400.837 |   52.91 |   3119 |  yyy <--
 330 | 32400.837 |   52.91 |   3119 |  yyy <--
 331 | 32400.837 |   52.91 |   3119 |  yyy <--
 332 | 32400.838 |   52.91 |   3119 |  yyy
 333 | 32400.838 |   52.91 |   3119 |  yyy
 334 | 32400.839 |   52.92 |   3220 |  yyy <--
 335 | 32400.839 |   52.92 |   3220 |  yyy <--
 336 | 32400.839 |   52.92 |   3220 |  yyy <--
 337 | 32400.840 |   52.91 |   2501 |  yyy

What I want is :

   time    |  price  | volume |  bar |   foo 
-----------+---------+--------+------+-------
 32400.837 |   52.91 |   3119 |  xxx |   yyy
 32400.837 |   52.91 |   3119 |  xxx |   yyy
 32400.837 |   52.91 |   3119 |  xxx |   yyy
 32400.839 |   52.92 |   3220 |  xxx |   yyy
 32400.839 |   52.92 |   3220 |  xxx |   yyy
 32400.839 |   52.92 |   3220 |  xxx |   yyy

I cannot do a classic INNER JOIN, or else I will have all the possible crossing between the two tables (in this case I would have 6x6 then 36 rows).

The though thing is to have only one row versus one row, although several rows could fit.

Thank you for your help.

EDIT :

As I said, if I use a classic INNER JOIN, for example

SELECT * FROM events e,
INNER JOIN trades t
ON t.time = e.time AND t.price = e.price AND t.volume = e.volume

I will have something like :

trade_id | event_id |   time    |  price  | volume |  bar |   foo 
---------+----------+-----------+---------+--------+------+-------
  205    |   329    | 32400.837 |   52.91 |   3119 |  xxx |   yyy
  205    |   330    | 32400.837 |   52.91 |   3119 |  xxx |   yyy
  205    |   331    | 32400.837 |   52.91 |   3119 |  xxx |   yyy
  206    |   329    | 32400.837 |   52.91 |   3119 |  xxx |   yyy
  206    |   330    | 32400.837 |   52.91 |   3119 |  xxx |   yyy
  206    |   331    | 32400.837 |   52.91 |   3119 |  xxx |   yyy
  207    |   329    | 32400.839 |   52.91 |   3119 |  xxx |   yyy
  207    |   330    | 32400.839 |   52.91 |   3119 |  xxx |   yyy
  207    |   331    | 32400.839 |   52.91 |   3119 |  xxx |   yyy
  208    |   334    | 32400.837 |   52.92 |   3220 |  xxx |   yyy
  208    |   335    | 32400.837 |   52.92 |   3220 |  xxx |   yyy
  208    |   336    | 32400.837 |   52.92 |   3220 |  xxx |   yyy
  209    |   334    | 32400.837 |   52.92 |   3220 |  xxx |   yyy
  209    |   335    | 32400.837 |   52.92 |   3220 |  xxx |   yyy
  209    |   336    | 32400.837 |   52.92 |   3220 |  xxx |   yyy
  210    |   334    | 32400.839 |   52.92 |   3220 |  xxx |   yyy
  210    |   335    | 32400.839 |   52.92 |   3220 |  xxx |   yyy
  210    |   336    | 32400.839 |   52.92 |   3220 |  xxx |   yyy

But what I want is :

trade_id | event_id |   time    |  price  | volume |  bar |   foo 
---------+----------+-----------+---------+--------+------+-------
  205    |   329    | 32400.837 |   52.91 |   3119 |  xxx |   yyy
  206    |   330    | 32400.837 |   52.91 |   3119 |  xxx |   yyy
  207    |   331    | 32400.839 |   52.91 |   3119 |  xxx |   yyy
  208    |   334    | 32400.837 |   52.92 |   3220 |  xxx |   yyy
  209    |   335    | 32400.837 |   52.92 |   3220 |  xxx |   yyy
  210    |   336    | 32400.839 |   52.92 |   3220 |  xxx |   yyy

回答1:


Here is my example with row_number.

Also, SQL Fiddle: SO 33608351

with 
trades AS
(
    select 201 as id, 32400.524 as time, 53 as price,       2085 as volume, 'xxx' as foo union all
    select 202, 32400.530, 53,      1162,   'xxx' union all
    select 203, 32400.531, 52.99,       50,     'xxx' union all
    select 204, 32400.532, 52.91,       3119,   'xxx' union all
    select 205, 32400.837, 52.91,       3119,   'xxx' union all
    select 206, 32400.837, 52.91,       3119,   'xxx' union all
    select 207, 32400.837, 52.91,       3119,   'xxx' union all
    select 208, 32400.839, 52.92,       3220,   'xxx' union all
    select 209, 32400.839, 52.92,       3220,   'xxx' union all
    select 210, 32400.839, 52.92,       3220,   'xxx'
),
events as
(
    select 328 as id, 32400.835 as time ,   52.91 as price ,   3119 as volume ,  'yyy' as bar union all
    select 329 , 32400.837 ,   52.91 ,   3119 ,  'yyy' union all
    select 330 , 32400.837 ,   52.91 ,   3119 ,  'yyy' union all
    select 331 , 32400.837 ,   52.91 ,   3119 ,  'yyy' union all
    select 332 , 32400.838 ,   52.91 ,   3119 ,  'yyy' union all
    select 333 , 32400.838 ,   52.91 ,   3119 ,  'yyy' union all
    select 334 , 32400.839 ,   52.92 ,   3220 ,  'yyy' union all
    select 335 , 32400.839 ,   52.92 ,   3220 ,  'yyy' union all
    select 336 , 32400.839 ,   52.92 ,   3220 ,  'yyy' union all
    select 337 , 32400.840 ,   52.91 ,   2501 ,  'yyy'
),
tradesWithRowNumber AS
(
    select   *
            ,ROW_NUMBER() over (PARTITION by time, price, volume order by time, price, volume) as RowNum
    from trades
),
eventsWithRowNumber AS
(
    select   *
            ,ROW_NUMBER() over (PARTITION by time, price, volume order by time, price, volume) as RowNum
    from events
)
select  t.time,
        t.price,
        t.volume,
        t.foo,
        e.bar
FROM    tradesWithRowNumber t
        inner JOIN
        eventsWithRowNumber e   on  e.time = t.time
                                AND e.price = t.price
                                AND e.volume = t.volume
                                and e.RowNum = t.RowNum



回答2:


Check this query -

SELECT Events.*,Trades.*
FROM Events
INNER JOIN Trades
ON Trades.time = Events.time
AND Trades.price = Events.price
AND Trades.volume = Events.volume



回答3:


Try this and let me know if. We can also you row_number() over(partion by) clause but I am not sure if it will work on postgreSQL. Anyways try this.

SELECT 
  min(t.id) as trade_id,min(e.id) as event_id,
  min(t.time) as time,min(t.price) as price,
  min(t.volume) as volume,  min(e.bar) as bar,
  min(t.foo) as foo 
FROM events e,
  INNER JOIN trades t
ON t.time = e.time AND t.price = e.price AND t.volume = e.volume
group by t.id



回答4:


Just looking at the sample data you have provided, one option would be:

SELECT e.id, min(t.id), e.time, e.price, e.volume, min(e.bar), min(t.foo)  FROM events e,
INNER JOIN trades t
ON t.time = e.time AND t.price = e.price AND t.volume = e.volume
GROUP BY e.id, e.time, e.price, e.volume



回答5:


If I understand correctly, you just want to list the foo and bar columns without creating a Cartesian product. For this purpose, you can introduce a new column using row_number() and join on that:

SELECT *
FROM (SELECT e.*,
             ROW_NUMBER() OVER (PARTITION BY time, price, volume ORDER BY id) as seqnum
      FROM events e
     ) e INNER JOIN
     (SELECT t.*,
             ROW_NUMBER() OVER (PARTITION BY time, price, volume ORDER BY id) as       FROM trades t
seqnum
     ) t
     ON t.time = e.time AND t.price = e.price AND t.volume = e.volume AND
        t.seqnum = e.seqnum;

Your question is unclear on whether you want an inner join, left outer join, or full outer join.



来源:https://stackoverflow.com/questions/33607479/sql-select-inner-join-one-by-one

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!