In trying to learn the art of data storage I have been trying to take in as much solid information as possible. PerformanceDBA posted some really helpful tutorials/examples
... Part II
Holy Toledo ! You are cooking with gas, young fella. All the issues are either minor, or relate to the new step you are learning.
Identifiers vs Id columns
I am not going to give you a full rundown here, as I have posted at least 20 times about how Id columns cripple a database an d rob it of Relational Power. I will deal with the issue in the context of this question only.
Here is ▶an example◀, check the question in detail first. Note that Mark is quite capable, but completely stuck. Then read my answer, then look at the Data Model. (Please do that now, it provides context)
The idea is either model the data, as data, which we are doing, and you will end up with a database, xor stick Id columns on everything that moves, which obstructs the modelling exercise and Normalisation, and you will end up with a bunch of spreadsheets "linked" to each other with massive duplication and no performance.
Therefore, remove all columns of the form [Table]Id from all tables (leave the Migrated Keys alone, they are correct), except the Following tables (these are the major Identifiers, reflected throughout the database. Note how ERwin will correct all child, grandchild, etc. tables:
Party
Address
Item
Relational/IDEF1X Identifiers
You are learning about Identifiers. These are the Natural Keys. Either Keys that the user uses, or Keys that have been Migrated from a parent to a child as Foreign Keys. These are therefore not only identifying the Relation but also Identifying the child. Your last name tells me not just about you, but also about your father, and also that you are your father's son. Want to make that unique ? No problem, just add a first name.
You have been reading my answers, looking at my Data Models, and then adding Identifiers to your model. It is **much* easier than that. ERwin (since it implements IDEF1X) does that for you.
Take Party, Band and Person. The Identifier for Party is PartyId
(ok, that is a Surrogate Key, not a Natural key; but the Natural Key Lastname, FirstName,BirthDate, etc. is very long, if we use that as the Primary Key, it would be Migrated to the children, grandchildren, great-grandchildren, which is not desirable, so we add a short Surrogate Key, and make that the Primary Key)
When you create the subtypes in ERwin, and indicate the Relation, it will automatically place PartyId in Band and Person, as the PK; it will mark it as "(FK)". (Note: I use bold font to denote (FK) in my models.)
That's it, you are done. Party::Band is 1:0-1, Band Primary Key is PartyId. Because it is a Subtype, ERwin will ensure the Relation is Identifying, and therefore the parent PK ends up in the child PK, and the Dependent child has round corners.
FirstName, or SequenceNo Role
Now for the next step. We know that Band::Party is 1::1; that Band is a child of Party; that Band.PartyId is the perfect PK (no Id column is required). Same for Person. But they are silly names, or put another way, Band is actually a different Role to Person, and they are both a Party. So we want to identify the Role clearly.
In Band, we would like to call PartyId, BandId, to reflect its Role. Edit the Relation, between the subtype symbol and the child, not the table. In the dialogue, fill in the RoleName as BandId. That's it. You are done.
Thus the following change from ... to:
Consequently ...
FloorItem.ItemId FloorItemId
BandItem.ItemId BandItemId
Other.BandItemId OtherId
Album.BandItemId AlbumId
Song.BandItemId SongId
Performance.BandItemId PerformanceId
Removing all the [Table]Id columns will leave the following tables without a PK. For now, add a Name column as the PK. You can tell me later what the user would like for a natural key, an Identifier for these tables:
Event
Genre
PartyAddress is an example (ie. modelled correctly) of what all I have discussed above. It had no PartyAddressId. PartyId and AddressId together form the PK. Both Relations are Identifying.
Identifying vs Non-identifying Relations
In reading a lot on the subject there seems to be a lot of disagreement and indecisiveness on the subject
Yes. Unfortunately, anyone with a keyboard and a modem can "publish" these days. People post opinions as facts; they post nonsense about subjects they are clueless about. This confuses people who are trying to learn.
It is science, not magic or a black art, not opinion.
When learning, read only definitions, and listen only to people who clearly transfer the science (not to anyone who is confused or treats the science like it is an art or that it is subject to opinion). We are learning facts, laws of physics, not opinions about the laws; the laws work the same for everyone, across the planet. You can't learn from someone who thinks a fact is an opinion.
Let's take it from the top:
The Relation is the defining criteria (renders the child Independent/Dependent), not the other way round.
A Relation is always an FK in the child, of the parent PK.
In an Identifying Relation, that FK is the PK (or the first part of the PK, where the PK is a composite key). And the child is a Dependent table.
In a Non-Identifying Relation that FK is a non-PK column, and the child is Independent (it may be forced into Dependence by some other Relation).
All Subtypes have Identifying Relations from the Supertype. Otherwise they would not be Subtypes, they would be Independent of the Supertype.
All 1:0-1 Relations are Identifying.
so I did what I thought represented the right things in my model.
[Table]Id keys.When to force (identifying) and when to be free (non-identifying)?
Never force anything re modelling (Database and Function), especially re data. It is the uncontrollable that we want to administer, manage, mould, control, etc. But to do that effectively, we have to understand it first. We cannot understand anything when we force it. Forcing it deprives it from exposing itself, and deprives us from noticing the subtleties and flavours (because we "know" it). Let it be free, but constrained, like a horse in a paddock, not a prisoner in a stable.
That is why the act of sticking Id columns on every spreadsheet prevents understanding of the data, and therefore any modelling of it.
As per above, it is the Relation that is Identifying or not; not whether the Entity is Independent or not, that is a consequence.
Do it Relational/IDEF1X/ERwin style:
You want an Entity, draw an Entity. Name it. Unless it is the first entity on the canvas, do not add keys.
Now consider its Relations. How do the Entities you have already modelled relate to this new Entity ? Draw that Relation (Relations are drawn Parent-to-child).
Of course, it defaults to Identifying, because most Relations in a (wait for it) Relational Database are Identifying. The parent PK is placed in the child PK.
If you think, no, no, I want this to be Independent, then you better have a good reason. The key question here is, does this entity exist completely on its own, does it exist outside the context of other Independent entities ? AFAIC, there are five in your model:
Address
Party
Item
Event
Genre
Every other entity exists only within the context of one of these Independent Entities. Thus you drew Identifying Relations, and thus they are all Dependent.
Recall, we had Item as Independent earlier; then we had a new form of Item; which made the old Item, BandItem; which made BandItem Dependent on the new Item.
We had a great Identifier in ItemId, which was carried not only in the (then) Item cluster but throughout, in OrderItem, Review, etc.
We changed the context of Item (created a higher order Item), and due to the Identifying Relations, that was then Migrated throughout, and the new BandItem was Migrated in its context.
The new ItemId continues to be a great Identifier. BandItemId is exactly ItemId, but plays a particular Role, it is a subset/subtype of ItemId.
So if it is a true Independent entity, go ahead and give it a new PK.
But at this stage, not an Id column, something meaningful that identifies the entity. Event.Name, Customer.Code. No human being identifies a Customer as number 123456, no, they think of "IBM", "3M", etc. Later on, as the model progresses, we will make sure we have really good Keys; right now with the new Entity, we care that it has an Identifier.
Exception. For Address, Party, Item, you knew at V1.0 you were going to have millions, thousands, thousands of them; that these were major Identifiers that would be MIgrated throughout the database; that the true PK was very long; and that you needed a short Surrogate Key as the PK; so you set that up from the outset, and you got no argument from me.
If you are ready for Domains, then INT, INTor SMALLINT, SMALLINT.
Otherwise Name, CHAR(30).
The next step is to finish the PK on the new entity. If the cardinality from the parent is 1::n, it already has the PK of the Parent, just add an element to make the PK unique. Let's look at Order. It already has PartId, so OrderNo can be within PartyId. Just change the order of the PK columns to (1) PartyId, (2) OrderNo.
The only time we do a little bit of forcing, is when the number of columns forming the PK becomes too many, or the total width of the PK becomes too wide, to Migrate as an FK into the children. Then, and only then, we create an additional Surrogate Key of the form [Table]Id (they are always additional, we can't lose the real PK or the uniqueness, because it supports other requirements).
AFAIC, that magic number is seven (magic no for a many things, actually; even this item appears as number seven), and that maximum width is 30 bytes. That was done from the outset with Address (already highly optimised), Party (otherwise 64 bytes), Item (over 30 bytes).
If we are going to break the intrinsic Relational power, we need the pain of carrying that Relational power itself to be really bad, and for no other reason. Not even approaching that in your model.
Review Cluster
You've done a very good job, so consider this as the next progression. Basically you have two options, and of course we are comparing/relating this to the Item cluster.
Going with the Review cluster as is. We need a SongReview and an AlbumReview. And get rid of ItemReview (that encapsulates all Items, which means we are doubling up). I thought we were excluding Reviews for non-Band Items.
Allow the non-BandReview to be about any BandItem, eg. change the ItemReview FK from Item to BandItem. That encapsulates all BandItems in ItemReview. Get rid of PerformanceReview.
Colour
It is great that you have adopted my colour scheme.
The meaning, the visual relevance, does not show up in a tiny model (most of my models on SO); it only shows up on larger models such as yours.
Because you have done such a great job with V1.3, the teacher has an ▶apple for you◀. Actually, the ▶IDEF1X Notation◀ document is worth reading again, it is very condensed, and I am told that people get more value out of it when they read it after modelling something. What I need to know is, whether the Natural Hierarchy and the Colour do anything for you.
That's just finishing off the Entity level Logical.
You can continue with the Logical, Key level (the only Attributes are FKs, and we know what they are). But feel free to start identifying Attributes (in which case, show the Attribute Level).
Optional Column
U.1) An Optional Parent has crept back into the model. PartyAddress is shipped for Order is not Nullable.
If you intended to model that the shipping address is optional, then you need an OrderShipAddress Entity, which is a child of Order, and the cardinality is 1::0-1.
Minor
M.11) These were correct in V1.2
Review::Comment is 1::0-1
BandMember:: Comment is 1:0-n
M.12) Event::Person is n::n (and the columns will not show at the logical level)
Very good progress. Are you happy with the Identifiers, the Keys ?
U.8) (If you do this first, the remainder will follow easily.) ERwin Limitation. Congratulations, you have produced a model that has reached the limitations of ERwin's capability in Logical modelling. To be clear, this is not really a limit, in that it gets resolved in the Physical Model, and of course it is not a limitation in IDEF1X or Relational Databases. But right now, at the Logical, it interferes with your learning and progress.
In BandItem we want the PK to be (BandItemId, BandId). But ERwin won't allow it because it says a Subtype PK must be the Supertype PK and nothing but. Actually, as long as the Supertype PK is the leading Identifier, another Identifying Relation is acceptable. To work around this:
The Relations that we had to make Non-identifying can now be Identifying.
ERwin will now resolve the Migrated PKs as FKs, without duplication.
Yes, chuck the Roles back in.
U.9) Now I understand what you are trying to do with the Review cluster, so first, let me say that you have modelled it correctly, all the way down to Rating.
M.13) Order::OrderShipAddress is 1::0-1, correct. PartyAddress::Order is 1::0-n, correct Therefore the Shipping Address should be PartyAddress::OrderShipAddress 1::0-n
M.14) Payment currently allows only one payment per Order, which may be what you require, but the relation is 1::1-n. If you need more, then add a SequenceNo to the PK.
M.15) Genre is fine. But SubGenre needs something in the PK to allow more than one Genre. I would now change Genre.Name to Genre.Genre, and add SubGenre to the SubGenre PK. - that will fix Event.GenreId as well.
M.16) Venue needs a Name for a PK for now. If you are ready for better keys, then ShortName, and Name moves down as an attribute.
Q.4) Confirming. Since we have an Identifying Relation in Order, and the PK (PartyId, OrderNo) therefore OrderNo is a sequential number within PartyId, correct ?
Go for V1.5. Include some Attributes. The best way to identify them is to either start a Function Model (and now work the Data Model side-by-side with it) or at least work through all the functions for all the screens.
Cheers