问题
I would like to use the single SQL-statement
insert into T (...) select ... from T where ...
to copy a lot of datasets. My problem is that there are N:M-relationships from table T
to other tables and these have to be copied too. How can I do this, if I do not know which original dataset belongs to which copied dataset? Let me demonstrate by example.
Content of the database before:
Table T
:
ID | COL1 | COL2
-----------------
1 | A | B
2 | C | D
N:M-table references table U from table T (table U is not shown):
T | U
---------
1 | 100
1 | 101
2 | 100
2 | 102
My copy operation where [???] is the part I do not know:
insert into T (COL1, COL2) select COL1, COL2 from T
insert into NM (T, U) select [???]
Content of the database after:
Table T
:
ID | COL1 | COL2
-----------------
1 | A | B
2 | C | D
3 | A | B
4 | C | D
N:M-table:
T | U
---------
1 | 100
1 | 101
2 | 100
2 | 102
3 | 100
3 | 101
4 | 100
4 | 102
Notice:
- I have thousands of datasets (not just two)
- I want to use 'insert ... select' to get a better performance
回答1:
If you are lucky enough to run the current PostgreSQL 9.1, there is an elegant and fast solution with a single command using the new data-modifying CTEs.
No such luck with MySQL which does not support Common Table Expressions (CTE), not to mention data-modifying CTE.
Assuming (col1, col2)
is initially unique:
Query 1
- You can easily pick arbitrary slices from the table in this case.
- No sequence numbers for
t.id
will be wasted.
WITH s AS (
SELECT id, col1, col2
FROM t
-- WHERE some condition
)
,i AS (
INSERT INTO t (col1, col2)
SELECT col1, col2 -- I gather from comments that id is a serial column
FROM s
RETURNING id, col1, col2
)
INSERT INTO tu (t, u)
SELECT i.id, tu.u
FROM tu
JOIN s ON tu.t = s.id
JOIN i USING (col1, col2);
If (col1, col2)
is not unique, I see two other ways:
Query 2
- Use the window function row_number() to make non-unique rows unique.
INSERT
rows without holes in thet.id
space just like in the query above.
WITH s AS (
SELECT id, col1, col2
, row_number() OVER (PARTITION BY col1, col2) AS rn
FROM t
-- WHERE some condition
)
,i AS (
INSERT INTO t (col1, col2)
SELECT col1, col2
FROM s
RETURNING id, col1, col2
)
,r AS (
SELECT *
, row_number() OVER (PARTITION BY col1, col2) AS rn
FROM i
)
INSERT INTO tu (t, u)
SELECT r.id, tu.u
FROM r
JOIN s USING (col1, col2, rn) -- match exactly one id per row
JOIN tu ON tu.t = s.id;
Query 3
- This is based on the same idea that @ypercube already supplied, but all in one query.
- If there are holes in numbers space for current
t.id
, sequence numbers will be burnt for the new rows accordingly. - Don't forget to reset your sequence beyond the new maximum or you will get duplicate key errors for new inserts in
t
that draw the default forid
from the sequence. I integrated this as final step into the command. Fastest & safest this way.
WITH s AS (
SELECT max(id) AS max_id
FROM t
)
,i AS (
INSERT INTO t (id, col1, col2)
SELECT id + s.max_id, col1, col2
FROM t, s
)
,j AS (
INSERT INTO tu (t, u)
SELECT tu.t + s.max_id, tu.u
FROM tu, s
)
SELECT setval('t_id_seq', s.max_id + s.max_id)
FROM s;
Details about setval() in the manual.
Test setup
For a quick test.
CREATE TEMP TABLE t (id serial primary key, col1 text, col2 text);
INSERT INTO t (col1, col2) VALUES
('A', 'B')
,('C', 'D');
CREATE TEMP TABLE tu (t int, u int);
INSERT INTO tu VALUES
(1, 100)
,(1, 101)
,(2, 100)
,(2, 102);
SELECT * FROM t;
SELECT * FROM tu;
There was a somewhat similar question recently, where I provided a somewhat similar answer. Plus alternatives for version 8.3 without CTEs and window functions.
回答2:
Step 1. Lock (both) tables or make sure that only this script is running. Disable FK checks.
Step 2. Use these two INSERT
statements, in this order:
INSERT INTO NM
(T, U)
SELECT
T + maxID, U
FROM
NM
CROSS JOIN
( SELECT MAX(ID) AS maxID
FROM T
) AS m
INSERT INTO T
(ID, COL1, COL2)
SELECT
ID+maxID, COL1, COL2
FROM
T
CROSS JOIN
( SELECT MAX(ID) AS maxID
FROM T
) AS m
Step 3. Re-enable FKs.
来源:https://stackoverflow.com/questions/10190409/copy-datasets-with-nm-relation