Storing and comparing unique combinations

问题

I need search functionality on a website where among other things you should be able to select multiple categories. The searches will be stored in the database but each unique combination of search parameters should only be stored once, this also includes the unique combination of selected categories.

The problem is that I cannot figure out how to store the combinations of selected categories. I have looked at arrays and found this http://blog.2ndquadrant.com/postgresql-9-3-development-array-element-foreign-keys/ but apparently that feature was never implemented.

So if I need to use multiple tables I was thinking I could have one table for the combinations that has an id for each unique combination that can be referenced and compared easily and then another table linking the categories to the combination. But with this approach how can I check if a combination already exists? The only way I can think of is to loop through all existing combinations and check if any one of those equal the compared combination.

I feel like this cannot be such an uncommon problem but I can't find any examples that do this. I also feel like my approach may not be the best. Any suggestions are very much welcome.

I have these two tables currently:

Categories
- CategoryId (int)
- Name (string)

Searches
- SearchId (int)
- Keywords (string)
- ExampleOption1 (bool)
- ExampleOption2 (bool)
- CategoriesCombinationId (int) -- this would represent the unique combination of categories and links to the combination table

And this is how I might try to solve the problem (if there is a good way to check if a combination already exists):

CategoriesCombinations -- unique combinations
- CombinationId (int)

CombinedCategories
- CombinationId (int) -- links to id in combinations table
- CategoryId (int) -- links to id in categories table

回答1:

Store as array (denormalized)

I would consider the additional module intarray that provides the convenient (and fast) functions uniq() and sort(). In a typical modern Postgres installation it's as easy as:

CREATE EXTENSION intarray;

How to create an index for elements of an array in PostgreSQL?

Using these, a simple CHECK constraint can enforce ascending arrays with distinct elements.

CHECK (uniq(sort(cat_arr)) = cat_arr)

You can additionally (optionally) have a trigger that normalizes array values ON INSERT OR UPDATE automatically. Then you can just pass any array (possibly unsorted and with dupes) and everything just works. Like:

CREATE OR REPLACE FUNCTION trg_search_insup_bef()
  RETURNS trigger AS
$func$
BEGIN
   NEW.cat_arr := uniq(sort(NEW.cat_arr);
   RETURN NEW;
END
$func$ LANGUAGE plpgsql;

CREATE TRIGGER insup_bef
BEFORE INSERT OR UPDATE OF cat_arr ON search
FOR EACH ROW
EXECUTE PROCEDURE trg_search_insup_bef();

The additional module intarray is optional, there are other ways:

Ordering distinct column values by (first value of) other column in aggregate function

But the intarray functions deliver superior performance.

Then you can just create a UNIQUE constraint on the array column to enforce uniqueness of the whole array.

UNIQUE (cat_arr)

I wrote more about the advantages of combining (very strict and reliable) constraints with (less reliable but more convenient) triggers in this related answer just two days ago:

Cross table constraints in PostgreSQL

If, for each combination, all you need to store per category is the ID (and no additional info), this should be good enough.
However, referential integrity is not easily ensured this way. There are no foreign key constraints for array elements (yet) - like documented in your link: If one of the categories is deleted or you change IDs, references break ...

Normalized schema

If you need to store more or you'd rather go with a normalized schema to enforce referential integrity or for some reason, you can do that, too, and add a trigger to populate a hand-made materialized view (a redundant table) and enforce uniqueness in a similar way:

CREATE TABLE search (
  search_id serial PRIMARY KEY
, ... more columns
);

CREATE TABLE cat (
  cat_id serial PRIMARY KEY
, cat text NOT NULL
);

CREATE TABLE search_cat (
  search_id int REFERENCES search ON DELETE CASCADE
, cat_id    int REFERENCES cat
, PRIMARY KEY (search_id, cat_id)
);

Related answer (not for unique combinations, but for unique elements) that demonstrates the trigger:

Can PostgreSQL have a uniqueness constraint on array elements?

回答2:

For your current approach you can use string_agg to have a string representation of all Categories in each CategoriesCombinations and check that against the new search:

SELECT CombinationId
FROM CombinedCategories
WHERE string_agg(CategoryId, ',') = '84,95,102'
GROUP BY CombinationId

But the simpler approach would be calculating a unique hash for each search based on all parameters and store that in Searches table and compare hash of the new search against search history.

来源：https://stackoverflow.com/questions/29732650/storing-and-comparing-unique-combinations

标签

sql

arrays

postgresql

database-design

unique-constraint