Do I need a primary key for my table, which has a UNIQUE (composite 4-columns), one of which can be NULL?

你离开我真会死。 提交于 2019-11-28 08:38:10
Erwin Brandstetter

Should I use a "serial" primary key just in case I ever need one?

You can easily add a serial column later if you need one:

ALTER TABLE product_pricebands ADD COLUMN id serial;

The column will be filled with unique values automatically. You can even make it the primary key in the same statement (if no primary key is defined, yet):

ALTER TABLE product_pricebands ADD COLUMN id serial PRIMARY KEY;

If you reference the table from other tables I would advise to use such a surrogate primary key, because it is rather unwieldy to link by four columns. It is also slower in SELECTs with JOINs.

Either way, you should define a primary key. The UNIQUE index including a nullable column is not a full replacement. It allows duplicates for combinations including a NULL value, because two NULL values are never considered the same. This can lead to trouble.


As

the colourid field can be NULL

you might want to create two unique indexes. The combination (template_sku, siteid, currencyid, colourid) cannot be a PRIMARY KEY, because of the nullable colourid, but you can create a UNIQUE constraint like you already have (implementing an index automatically):

ALTER TABLE product_pricebands ADD CONSTRAINT product_pricebands_uni_idx
UNIQUE (template_sku, siteid, currencyid, colourid)

This index perfectly covers the queries you mention in 2).
Create a partial unique index in addition if you want to avoid "duplicates" with (colourid IS NULL):

CREATE UNIQUE INDEX product_pricebands_uni_null_idx
ON product_pricebands (template_sku, siteid, currencyid)
WHERE colourid IS NULL;

To cover all bases. I wrote more about that technique in a related answer on dba.SE.


The simple alternative to the above is to make colourid NOT NULL and create a primary key instead of the above product_pricebands_uni_idx.


Also, as you

basically DELETE most of the data

for your refill operation, it will be faster to drop indexes, that are not needed during the refill operation, and recreate those afterwards. It is faster by an order of magnitude to build an index from scratch than to add all rows incrementally.

How do you know, which indexes are used (needed)?

  • Test your queries with EXPLAIN ANALYZE.
  • Or use the built-in statistics. pgAdmin displays statistics in a separate tab for the selected object.

It may also be faster to select the few rows with my_custom_field = TRUE into a temporary table, TRUNCATE the base table and re-INSERT the survivors. Depends on whether you have foreign keys defined. Would look like this:

CREATE TEMP TABLE pr_tmp AS
SELECT * FROM product_pricebands WHERE my_custom_field;

TRUNCATE product_pricebands;
INSERT INTO product_pricebands SELECT * FROM pr_tmp;

This avoids a lot of vacuuming.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!