What is best design for one-to-many relationship with back references to each other?

问题

I am trying to find best design for SQL database schema for one-to-many relationship. In my project i have objects which consist of number of nodes and i would like each object to have optional foreign key reference to root_node. So my initial solution looks like this (for clarity i am skipping dependency problem):

-- schema A

CREATE TABLE objects (
   object_id integer NOT NULL PRIMARY KEY,
   root_node integer REFERENCES nodes(node_id),
    <some other data>
);

CREATE TABLE nodes (
   node_id integer NOT NULL PRIMARY KEY,
   object_id integer REFERENCES objects,
   <some other data>
);

However now we have two tables with foreign key references to each other which i am not sure is a good thing. So i am considering another approach when instead of putting root_node inside objects table it is stored separately as root_nodes:

-- schema B

CREATE TABLE objects (
   object_id integer NOT NULL PRIMARY KEY,
    <some other data>
);

CREATE TABLE root_nodes (
   object_id integer REFERENCES objects PRIMARY KEY,
   root_node integer REFERENCES nodes(node_id),
);

CREATE TABLE nodes (
   node_id integer NOT NULL PRIMARY KEY,
   object_id integer REFERENCES objects,
   <some other data>
);

So my question is: does both A and B designs consider to be acceptable or there is a known 'best practice' which will prefer one over the other? If so, could you please provide rationale why one of schema is better?

回答1:

In schema B You can have multiple root nodes for the object and root node can be node of another object. Schema A forces at most one root node for the object (which is what we want I guess), but shares the second issue. I do not know if there is some "best practice" for this, but here are some ideas.

If You need more root nodes for the object, it is actually very simple to do, You need just bit flag:

CREATE TABLE objects (
   object_id integer NOT NULL PRIMARY KEY,   
    <some other data>
);

CREATE TABLE nodes (
   node_id integer NOT NULL PRIMARY KEY,
   object_id integer REFERENCES objects,
   is_root bit NOT NULL
   <some other data>
);

If You want only one root node for the object, You can add filtered unique index:

CREATE UNIQUE NONCLUSTERED INDEX unique_root_for_object ON nodes
(
    object_id
)
WHERE (is_root = 1)

Lets call it schema C for now. Now lets return to the schema A and fix the "root from different object" issue. You can add composite foreign key to force root node be one of the object nodes:

ALTER TABLE objects WITH CHECK CHECK 
CONSTRAINT FK_objects_nodes FOREIGN KEY(object_id, root_node) 
REFERENCES nodes (object_id, node_id)

You would need unique index on (object_id, node_id) on table nodes for this to work. You can still have objects without root node of course, they would not violate this foreign key.

Is better schema A or C? Schema C seems more flexible, You can add node as root node in one insert for example. You can also easily switch it to "multiple root nodes" scenario. Schema A on the other hand allows You to create index on objects with root node information. When logging changes, change of root node would be logged as change of object, not change of the node. The dependency is more explicit, which would simplify some queries a bit and ORM would like it too.

There may be another ways to do this. As a rule of thumb I would try to stick to schemas that do not allow data inconsistency by design.

回答2:

In this scenario, you're enforcing an additional constraint on your data that you aren't making the DB aware of given your table definitions.
Either you need to add a trigger to enforce the constraint, or you leave out root not information completely and find it programmatically when needed.

The biggest problem with your current scheme is that you can change information in one table that makes the information in the other wrong semantically.

Ex:

A -> B -> C  
A is the root node of C

I can break this in either of your formats by updating something in one table but neglecting to do so in another. I could update C's root node to be B, but forget to remove the parent-child relationship between A and B. Or I could add a parent to A and forget to update the root node information for B or C.

My suggestion is to not store the root-node data and compute it when it's needed.

来源：https://stackoverflow.com/questions/40985510/what-is-best-design-for-one-to-many-relationship-with-back-references-to-each-ot

标签

sql

database-design

database-normalization