问题
I have stock_price_alert table with 3 columns. stock_price_id is PRIMARY KEY & also FOREIGN KEY to other table. Table definition as below:
create table stock_price_alert (
stock_price_id integer references stock_price (id) on delete cascade not null,
fall_below_alert boolean not null,
rise_above_alert boolean not null,
primary key (stock_price_id)
);
I need to either:
1) INSERT record if not exist
-- query 1
INSERT INTO stock_price_alert (stock_price_id, fall_below_alert, rise_above_alert)
VALUES (1, true, false);
2) UPDATE record if exist
-- query 2
UPDATE stock_price_alert SET
fall_below_alert = true,
rise_above_alert = false
WHERE stock_price_id = 1;
First I need to issue SELECT query on stock_price_alert table, in order to decide whether to perform query (1) or (2).
Postgres supports INSERT INTO TABLE .... ON CONFLICT DO UPDATE ...:
-- query 3
INSERT INTO stock_price_alert (stock_price_id, fall_below_alert, rise_above_alert)
VALUES (1, true, false)
ON CONFLICT (stock_price_id) DO UPDATE SET
fall_below_alert = EXCLUDED.fall_below_alert,
rise_above_alert = EXCLUDED.rise_above_alert;
Instead of using query (1) or (2), can I always use query (3)? Then I don't need to issue SELECT query in prior & it helps to simplify the code.
But I am wondering, which is the best practice? Will query (3) cause performance issue or unwanted side effect? Thanks.
回答1:
Query 3 is the Postgres syntax for "UPSERT" (= UPDATE or INSERT), introduced in Postgres 9.5.
From the documentation:
ON CONFLICT DO UPDATEguarantees an atomicINSERTorUPDATEoutcome; provided there is no independent error, one of those two outcomes is guaranteed, even under high concurrency. This is also known asUPSERT– “UPDATEorINSERT”.
This is the best practice for what you are trying to achieve.
回答2:
I noticed/tested that is much faster for INSERTS (have yet to test UPSERTS) to use a WHERE NOT EXISTS in addition to ON CONFLICT. Typically about 3x faster than just allowing the ON CONFLICT to handle existence checks. I think this may carry over into UPSERTS, making it likely faster to do an INSERT and then and UPDATE. Here is my test for inserts only...
--so i can keep rerunning
DROP TABLE if exists temp1;
DROP TABLE if exists temp2;
--create a billion rows
SELECT GENERATE_SERIES AS id INTO TEMP temp1
FROM GENERATE_SERIES(1, 10000000);
CREATE UNIQUE INDEX ux_id ON temp1(id);
ALTER TABLE temp1 CLUSTER ON ux_id;
--create a second table to insert from, with the same data
SELECT * INTO TEMP temp2
FROM temp1;
CREATE UNIQUE INDEX ux_id2 ON temp2(id);
ALTER TABLE temp2 CLUSTER ON ux_id2;
--test inserting with on conflict only
INSERT INTO temp1(id)
SELECT id
FROM temp2 ON conflict DO nothing;
--execution time: 14.71s (1million rows)
--test inserting with not exists and on conflict
INSERT INTO temp1(id)
SELECT t2.id
FROM temp2 t2
WHERE NOT EXISTS (SELECT 1 FROM temp1 t1 WHERE t2.id = t1.id)
ON conflict DO nothing;
--execution time: 5.78s (1million rows)
来源:https://stackoverflow.com/questions/48922972/postgres-insert-on-conflict-do-update-vs-insert-or-update