问题
I have a specific request to do on my database (PostgreSQL v9.4.5), and I don't see any elegant solution in pure SQL to solve it (I know I can do it using Python or other, but I have several billions lines of data, and the calculation time would be greatly increased).
I have two tables : trades and events. These tables both represent the trades occurring in an orderbook during a day (this is why I have several billions lines, my data is over several years) but there are many more events than trades.
Both tables have columns time, volume and quantity, however each one has other columns (let's say respectively foo and bar) with specific information. I want to make a correspondence between the two tables on the columns time, volume and price, as I know this correspondence exists as an injection from trades to events (if there are n rows in trades with the same time t, the same price p and the same volume v, I know there are also n rows in events with the time t, the price p and the volume v).
Trades :
id | time | price | volume | foo
-----+-----------+---------+--------+-------
201 | 32400.524 | 53 | 2085 | xxx
202 | 32400.530 | 53 | 1162 | xxx
203 | 32400.531 | 52.99 | 50 | xxx
204 | 32400.532 | 52.91 | 3119 | xxx
205 | 32400.837 | 52.91 | 3119 | xxx <--
206 | 32400.837 | 52.91 | 3119 | xxx <--
207 | 32400.837 | 52.91 | 3119 | xxx <--
208 | 32400.839 | 52.92 | 3220 | xxx <--
209 | 32400.839 | 52.92 | 3220 | xxx <--
210 | 32400.839 | 52.92 | 3220 | xxx <--
Events :
id | time | price | volume | bar
-----+-----------+---------+--------+------
328 | 32400.835 | 52.91 | 3119 | yyy
329 | 32400.837 | 52.91 | 3119 | yyy <--
330 | 32400.837 | 52.91 | 3119 | yyy <--
331 | 32400.837 | 52.91 | 3119 | yyy <--
332 | 32400.838 | 52.91 | 3119 | yyy
333 | 32400.838 | 52.91 | 3119 | yyy
334 | 32400.839 | 52.92 | 3220 | yyy <--
335 | 32400.839 | 52.92 | 3220 | yyy <--
336 | 32400.839 | 52.92 | 3220 | yyy <--
337 | 32400.840 | 52.91 | 2501 | yyy
What I want is :
time | price | volume | bar | foo
-----------+---------+--------+------+-------
32400.837 | 52.91 | 3119 | xxx | yyy
32400.837 | 52.91 | 3119 | xxx | yyy
32400.837 | 52.91 | 3119 | xxx | yyy
32400.839 | 52.92 | 3220 | xxx | yyy
32400.839 | 52.92 | 3220 | xxx | yyy
32400.839 | 52.92 | 3220 | xxx | yyy
I cannot do a classic INNER JOIN, or else I will have all the possible crossing between the two tables (in this case I would have 6x6 then 36 rows).
The though thing is to have only one row versus one row, although several rows could fit.
Thank you for your help.
EDIT :
As I said, if I use a classic INNER JOIN, for example
SELECT * FROM events e,
INNER JOIN trades t
ON t.time = e.time AND t.price = e.price AND t.volume = e.volume
I will have something like :
trade_id | event_id | time | price | volume | bar | foo
---------+----------+-----------+---------+--------+------+-------
205 | 329 | 32400.837 | 52.91 | 3119 | xxx | yyy
205 | 330 | 32400.837 | 52.91 | 3119 | xxx | yyy
205 | 331 | 32400.837 | 52.91 | 3119 | xxx | yyy
206 | 329 | 32400.837 | 52.91 | 3119 | xxx | yyy
206 | 330 | 32400.837 | 52.91 | 3119 | xxx | yyy
206 | 331 | 32400.837 | 52.91 | 3119 | xxx | yyy
207 | 329 | 32400.839 | 52.91 | 3119 | xxx | yyy
207 | 330 | 32400.839 | 52.91 | 3119 | xxx | yyy
207 | 331 | 32400.839 | 52.91 | 3119 | xxx | yyy
208 | 334 | 32400.837 | 52.92 | 3220 | xxx | yyy
208 | 335 | 32400.837 | 52.92 | 3220 | xxx | yyy
208 | 336 | 32400.837 | 52.92 | 3220 | xxx | yyy
209 | 334 | 32400.837 | 52.92 | 3220 | xxx | yyy
209 | 335 | 32400.837 | 52.92 | 3220 | xxx | yyy
209 | 336 | 32400.837 | 52.92 | 3220 | xxx | yyy
210 | 334 | 32400.839 | 52.92 | 3220 | xxx | yyy
210 | 335 | 32400.839 | 52.92 | 3220 | xxx | yyy
210 | 336 | 32400.839 | 52.92 | 3220 | xxx | yyy
But what I want is :
trade_id | event_id | time | price | volume | bar | foo
---------+----------+-----------+---------+--------+------+-------
205 | 329 | 32400.837 | 52.91 | 3119 | xxx | yyy
206 | 330 | 32400.837 | 52.91 | 3119 | xxx | yyy
207 | 331 | 32400.839 | 52.91 | 3119 | xxx | yyy
208 | 334 | 32400.837 | 52.92 | 3220 | xxx | yyy
209 | 335 | 32400.837 | 52.92 | 3220 | xxx | yyy
210 | 336 | 32400.839 | 52.92 | 3220 | xxx | yyy
回答1:
Here is my example with row_number.
Also, SQL Fiddle: SO 33608351
with
trades AS
(
select 201 as id, 32400.524 as time, 53 as price, 2085 as volume, 'xxx' as foo union all
select 202, 32400.530, 53, 1162, 'xxx' union all
select 203, 32400.531, 52.99, 50, 'xxx' union all
select 204, 32400.532, 52.91, 3119, 'xxx' union all
select 205, 32400.837, 52.91, 3119, 'xxx' union all
select 206, 32400.837, 52.91, 3119, 'xxx' union all
select 207, 32400.837, 52.91, 3119, 'xxx' union all
select 208, 32400.839, 52.92, 3220, 'xxx' union all
select 209, 32400.839, 52.92, 3220, 'xxx' union all
select 210, 32400.839, 52.92, 3220, 'xxx'
),
events as
(
select 328 as id, 32400.835 as time , 52.91 as price , 3119 as volume , 'yyy' as bar union all
select 329 , 32400.837 , 52.91 , 3119 , 'yyy' union all
select 330 , 32400.837 , 52.91 , 3119 , 'yyy' union all
select 331 , 32400.837 , 52.91 , 3119 , 'yyy' union all
select 332 , 32400.838 , 52.91 , 3119 , 'yyy' union all
select 333 , 32400.838 , 52.91 , 3119 , 'yyy' union all
select 334 , 32400.839 , 52.92 , 3220 , 'yyy' union all
select 335 , 32400.839 , 52.92 , 3220 , 'yyy' union all
select 336 , 32400.839 , 52.92 , 3220 , 'yyy' union all
select 337 , 32400.840 , 52.91 , 2501 , 'yyy'
),
tradesWithRowNumber AS
(
select *
,ROW_NUMBER() over (PARTITION by time, price, volume order by time, price, volume) as RowNum
from trades
),
eventsWithRowNumber AS
(
select *
,ROW_NUMBER() over (PARTITION by time, price, volume order by time, price, volume) as RowNum
from events
)
select t.time,
t.price,
t.volume,
t.foo,
e.bar
FROM tradesWithRowNumber t
inner JOIN
eventsWithRowNumber e on e.time = t.time
AND e.price = t.price
AND e.volume = t.volume
and e.RowNum = t.RowNum
回答2:
Check this query -
SELECT Events.*,Trades.*
FROM Events
INNER JOIN Trades
ON Trades.time = Events.time
AND Trades.price = Events.price
AND Trades.volume = Events.volume
回答3:
Try this and let me know if. We can also you row_number() over(partion by)
clause but I am not sure if it will work on postgreSQL. Anyways try this.
SELECT
min(t.id) as trade_id,min(e.id) as event_id,
min(t.time) as time,min(t.price) as price,
min(t.volume) as volume, min(e.bar) as bar,
min(t.foo) as foo
FROM events e,
INNER JOIN trades t
ON t.time = e.time AND t.price = e.price AND t.volume = e.volume
group by t.id
回答4:
Just looking at the sample data you have provided, one option would be:
SELECT e.id, min(t.id), e.time, e.price, e.volume, min(e.bar), min(t.foo) FROM events e,
INNER JOIN trades t
ON t.time = e.time AND t.price = e.price AND t.volume = e.volume
GROUP BY e.id, e.time, e.price, e.volume
回答5:
If I understand correctly, you just want to list the foo
and bar
columns without creating a Cartesian product. For this purpose, you can introduce a new column using row_number()
and join on that:
SELECT *
FROM (SELECT e.*,
ROW_NUMBER() OVER (PARTITION BY time, price, volume ORDER BY id) as seqnum
FROM events e
) e INNER JOIN
(SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY time, price, volume ORDER BY id) as FROM trades t
seqnum
) t
ON t.time = e.time AND t.price = e.price AND t.volume = e.volume AND
t.seqnum = e.seqnum;
Your question is unclear on whether you want an inner join, left outer join, or full outer join.
来源:https://stackoverflow.com/questions/33607479/sql-select-inner-join-one-by-one