What are the basic rules for defining entities, attributes, and relationships in an ER-Diagram?

十年热恋 提交于 2019-12-02 09:55:26

In the original Entity-Relationship method, we identfy entity types and relationship/association types and their relevant attributes. When drawing an ER diagram we pick one or more groups of attributes that can identify an instance of a type as a key. If there isn't such an identifying group of attributes then we have to introduce an attribute as a surrogate. A type's value set (the set of its instances) will be represented by a relation/table.

From Chen's 1976 The Entity-Relationship Model-Toward a Unified View of Data:

Basically, an entity key is a group of attributes such that the mapping from the entity set to the corresponding group of value sets is one-to-one. If we cannot find such one-to-one mapping on available data, or if simplicity in identifying entities is desired, we may define an artificial attribute and a value set so that such mapping is possible.

These are the only attributes we must have. (Surrogate attributes arise from this necessity.)

If you are not interested in any non-key attributes of an entity type or relationship type, then you're not going to have any other attributes for it in your design/diagram.

An ER diagram shows entity types and relationship types. It doesn't show entity instances or relationship instances.

An entity exists only if it has properties/attributes. You can not define entities without properties/attributes, even if they have primary key.

Entity types in partitioning relationships (represented by the triangle symbol) exist without primary key in a Conceptual Model. But they exist only if they possess attributes equally. Once decomposed the relationship, in the Physical Model, they will have the primary key of your super entity as a foreign key and also as a primary key.

1-1 and 1-N (or N-1) relationships do not have attributes/properties. On the other hand, N-N relations may or may not have properties/attributes. If you need to set attributes/properties to N-1 (or 1-N) relationships, they should be on the N side.

The Conceptual Model may have N-N relations, but the Logic Model does not. When decomposed, the Logical Model should end up with only 1-1 relationships or/and 1-N (or N-1) relationships.

Any questions, please comment and I will answer.

EDIT

The @reaanb user raised an interesting question. I think it is valid to add content as the answer itself. You (the author of the question) can even remove some doubts about what I said. So let's see...

When you decomposes 1-1 or 1-N relations that has attributes, while maintaining the attributes in the relationship (according to what has been mentioned in the comments), you will have to create another table that need to at least have a foreign key to one of the sides of the relationship.

In terms of functionality, keep attributes on the N side can save you the creation of this table (in case, you don't have to create this table). One less table within the final model is one less join operation, which can be costly if you need to do many times.

In terms of readability, 1-N relationships represent unilateral relations. Something that, in my view, always creates a dependency on a single side, not on both. Because of this, changes to the N side are simply irrelevant to the 1 side. In programming, development of programs and systems, at least, it follows the concept of decreasing the coupling between the parties, Which in this case, is a good thing (even for the development of databases).

In 1-N (or N-1) relationships that have attributes/properties, as I have said, the attributes are put on the N side, because/due to unilateral relationship created they are relevant only to the N side.

The only situation where this can complicate is for 1-1 relations. In this type of occasion, we have to consider two things:

  • Who is dependent?
  • Who, in the future, can change its cardinality to N?

1-1 relations may be confusing as they may (at first sight) represent unilateral relationships. But it is rule that in every way we need to define a foreign key, right? We need to create a relationship after all. For so cases, it is the discretion of the Data Modeler. So we generally put the foreign key on the side that can eventually become N. But this is not mandatory. Again, you must have discretion.

Now, and when 1-1 relations have attributes? That's a lot, if not impossible (I kindly ask an example of context if anyone disagree with a plausible explanation of a real case, and if it exists, be sure that I will take back what I said). If it happens it's good to review your diagram. Ask yourself immediately:

  • For what reason these attributes are there?
  • Are they relevant to both sides?

Attributes in relationships are great indications that we have N-N or 1-N (or N-1) relationships happening, or that you are putting attributes in the wrong place. In this case simply just follow the aforementioned rules here.


How about a quick example to show practice (about 1-1 relationships)? Let's imagine the following situation (context):

Imagine that we have a company where we have employees in their positions/roles. Imagine that in our company we have office rooms. Each office and employee has a proper name. Imagine now that an office has only one employee, and an employee works in one office.

This would give us the following Conceptual Model:

So far so good, right? Now let's add one more detail:

Each employee in every office must work about X hours a day. You need to know how many hours an employee works, and which days he/she was working.

Where will you add this information? Do not see the answer yet. Stop to think a little. Stop to think, really. Take a look at the context. This information is relevant to whom? To both sides? Let us develop a line of thought together...

Let's suppose we were to add a multi-valued attribute named "hours of work", right? This attribute when decomposed will become a new entity with at least two new attributes/fields: the days and the hours an employee worked in an office.

Note: Ideally, we transform this property into one entity directly, but for the sake of understanding, we will leave it as an attribute.

This information needs to be persisted, so Where should we put this? In the relationship? In Employees entity? In Offices entity? Where?

If we use the relationship, then we are saying that when we do the decomposition of the Conceptual Model, the relationship will become a new table. This table will have at least three fields:

  • Employee (FK)
  • Office (FK)
  • hours of work (FK)

The first two fields are obvious. The last field will represent a foreign key to a new table that arose from the decomposition of a multi-valued attribute.

This relationship can accurately show us who, in which office, worked in which days for how long. But here are the caveats:

  • You created one more table, which means one more join in queries.
  • With the creation of an intermediate/link/join table if you're not careful, many employees can work in one or several different offices.
  • You may end up giving a wrong workload to the relationship established between the two entities if you get confused. You need to know in addition to the employee, the office as well.

Let us dismiss this idea for now to examine another case. Let's say we did not put the attribute in the relationship, and we still have to put it somewhere. What would happen if we put the attribute in Offices entity? Does it makes sense to store this information here? What if in the future an office can have several employees working at it? Is this the best place to store this information?

If this should happen, keep this information here will not help. This attribute keeps the workload of just one employee. With several employees working in an office, you would not have, even with a SQL join, how to say who worked how long in what office. You would have anyway to create a relationship between the three: employees, offices, and working hours, and again we would go back to the relationship made (1-1 with attributes till it becomes a new table).

If in the future, an office can have several employees working on, maybe we should put this attribute/property in Employees entity (the N side), right? Take a look:

See how beautiful it is. When all this get decomposed, we will have a workload relationship for each employee, and we will know which employee works in which office. All this is done with a simple SQL join. In any way we define here an attribute in a relationship (Remember that was supposed during the 1-1 relationship that we would put the attribute on the side that could become N in the future).

There's no one more new table, and there is no longer one more table participating in a SQL join. There is no way you accidentally assign more than one office to an employee. You don't need to create more rules (constraints) for that. This brings us to a very important issue in the development/modeling databases:

  • Data Consistency

To a shapely, well prepared, and well managed database it's virtually impossible for client applications to leave it in a inconsistent state, which could completely compromise the entire system (especially for safety or theft of sensitive data). Client applications can validate data (What is costly in terms of maintenance, if not done properly), but the database is the last line of defense for this (I learned that here in the community).

Anyway, if we never have to change the cardinality between the entities, we need to have discretion. In 1-1 relation created earlier, I/me, as a professional, would have consulted my client and ask how he/she thinks his/her company will grow in the future.


As a final note (as if I had not already talked a lot... lol), I want to say that the notations and the knowledge that I used here about Data Modeling differs somewhat from the way many people perceive in many countries.

When I speak, for example, about 1-N (or N-1) relationships, I am not considering that the 1 side may have cardinality zero. Many people use other notations, such as 0...N, or (0,3), and so on. These notations are not nonsense, they are official and are used to model data in the same way.

I hope I have spent a bit of my experience to someone, and I'm happy with it. We are all professionals, but we are all colleagues and we are here for the same purpose: exchange knowledge. If anyone has any questions, please comment and I will answer. Thanks for listening.

EDIT 2

More interesting questions were raised in the comments. I have to agree that in my first post, the way I put things may seem to some people senseless. But they are not. What I put are rules that I learned during my years of course.

The way I learned here I my country is that within the ER-Diagram we have 3 stages of development for modeling data. These stages are made official by its original author, Peter Chen. Still, it was been seen over the years the need for the existence of a fourth model, known by the name Descriptive Model. This model reveals the problem/context by saying what the situation/case of the real world that needs to be addressed. In the case, what your customer/client needs.

It's need for existence appeared as often more of a professional can work or start working on a project, and he/she must know what he/she is working, not being enough, often, to only look at the final diagram. In summary, this template serves as a starting point, both for the development, as to the understanding of a problem.

Once you have this model in hand, which describes all the customer's needs, you remain faithful to it, and proceeds to the next stage of development, thus starting the Conceptual Model.

For the conceptual model, many professionals (as I have seen) use different techniques and devices for their development. During my studies, I learned from my teachers that the transformation from one situation/problem/context to the Conceptual Model we must follow rules. These are rules which are mentioned in my original post.

For what reason we have to follow these rules?

For we do not begin to guess, or to draw nonsense.

What were the rules passed to you to identify entities for the Conceptual Model?

An entity is identified when we have something in the real world that has attributes/properties, and we have to persist these attributes. Simple entities are generally seen by nouns. But not in every case we have entities with noun names.

For what reason an entity must have attributes to be added to the model?

Because we can often identify entities this way that do not have attributes, but when we decompose this, we end up with tables without columns, without data being persisted, which often have only foreign primary key.

Note that, often this can be used to create subtypes of things, for example, individuals, legal entities, vehicles, cars, motorcycles, etc. But that, depending on the situation, can be harmful, because the addition of new types implies a change in the model, in the database, the rules of the bank, in the systems that use it, and so on.

This was the reason I mentioned about partitioning entities, as this is a technique for creating types and subtypes of things.

Still, can I set the entity?

Yes. The model is yours, and it is in your perspective if you should or should not end up with another table.

Could you give an example of identifying entities?

Imagine that we have a car rental company. Our customers are registered in the system, along with the vehicles. Every client/customer has name, contact, address, etc. Every vehicle has a model, card/plaque/plate (I really don't know the translation of this to you here), year, etc.

We can easily find two entities here:

  • Vehicles
  • Customers

Why?

  • They have properties.
  • The context/situation/problem/client calls for them to be persisted in the database.

Could you show me a context where we can define entities in the wrong way? In case, could you show me a context where it is dangerous to just set attributeless entities?

Imagine that we have a supermarket. Similarly, we have registered customers in this store. However, our supermarket sells to individuals and legal entities.

If you could not identify types here properly, you may end up with problems.

Why?

If we have only to persist the type of client we sell, this type end up as an attribute, not an entity. If the types of customers/clients have attributes which some are different, and some are equal, then we have a partitioning. We will have a super "Client" entity, and two dependent entities:

  • Individual
  • Legal Entity

Why?

  • You may end up with additional tables which have no relevant attributes to be persisted.
  • It is easier to see relationships and attributes this way.
  • Perform maintenance in the future becomes more practical, it takes less time and is less costly.

Right... And how can I identify relations / relationships?

That depends on the context. You need to see:

  • Two entities are related in some way?
  • Do I need to persist this?

If the answer is yes to both questions, so be sure that we have a relationship going on here, and we need to represent it in the diagram.

Could you give an example?

Again, following the example of the car rental company, imagine you have to keep/save/persist/register/remember the rent event made of a customer with a vehicle.

If we have two entities already defined our context now it says that there is an event happening between them. The mentioned event is the rent. Take a look, this is a relationship, and we have to persist in such a way.

Could these two entities create other types of relationship between them?

This will depend entirely on the context. Instead of the rent event, do you need to persist who licked the car? So this is an event, an relationship (strange, but it is). Note, it is the context who dictates the relationship.

And what about to the attributes, how can I define them correctly?

Initially, attributes belong to entities. The attributes represent the characteristics of the entities that need to be persisted. You should not put attributes:

  • If they should not be persisted (they do not appear in context, were not requested by the client).
  • If they do not logically belong to an entity. For example, placing the attribute price in a customers entity, rather than put it in an products entity.

For what reason you said "initially" attributes belong to entities?

Because attributes can exist in relationships as well, but when they do, they are in N-N relationships.

Why?

Again, this is a rule, and this has been explained in my first edition.

From what you have studied, what are the differences between the 3 types of models created and made official by Peter Chen?

  • In the Conceptual Model: Here we must define entities, attributes, and relationships.
  • In the Logical Model: Once the previous model is ready, this model shows a decomposed version of everything. Relationships are solved (1-N, N-1 and N-N, aggregations, partitioning, recursive relations), also special attributes (functional redundancy, multi-valued attributes). Everything becomes entity (practically tables), and all relationships end in 1-N (or N-1).
  • In the Physical Model: With everything already set in the previous phase, now the names of the entities are modified to their versions in tables, as well as their attributes. Other than that, its attributes now show the primitive data type present in the database that will be used. The drawing diagram itself takes on another appearance. Something more like this.

Can you model any database, any real-world situation with all this?

Yes.

Do you take into account the data inconsistency and inappropriate creation of additional structures with that?

Yes. All of this is taken into account. Just the fact that you follow these types of rules already makes the database designed to not create and enable/allow structures or failed relationships.

In short, you create only what you need, without allowing for human error, to put something in the wrong way/place.

How?

All this was explained above, in all that has been said. I will not repeat myself here. You just have to read.

In case you need something lighter, ask, show me where you have doubt and I'll leave it clearer.

For what reason you again edited your answer?

As I said, I think in my first original response I was not very clear, as commented. This edition should:

  • Clarify what was unclear.
  • Give real examples of situations and cases.
  • Show that there is ground for all that I speak.
  • Show the initial steps in data modeling in more detail.
  • Show how I learned here in my country.
  • Share more knowledge/information, which I believe is the purpose of this community.
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!