Explain Like I am Five -> How a Primary Key Satisfies First Normal Form

问题

Thank you for your knowledge in advance. I am studying for the Microsoft Technology Exam and one of the practice questions is :

Creating a primary key satisfies the first normal form. True or False?

I personally think it is False because the first normal form is to get rid of duplicate groups. But there is a sentence in the text (Database Fundamentals, Exam 98-364 by Microsoft Press) that says the following:

"The first normalized form (1NF) means the data is in an entity format, which basically means that the following three conditions must be met: • The table must have no duplicate records. Once you have defined a primary key for the table, you have met the first normalized form criterion."

Please help me understand this, please explain like I am five. Thanks.

回答1:

A primary key must be completely unique. So once this is part of a record, it is distinct from any other record.

eg.

Record 1
---------
KEY = 1
Name = Fred Boggs
Age = 84


Record 2
--------
KEY = 2
Name = Fred Boggs
Age = 84

These 2 records are different because the field KEY is different. Therefore although the rest of the data is the same, it meets the requirements for 1NF.

回答2:

I can't explain this stuff to a five year old. I've tried. But I may be able to shed a little light on the subject. The first thing you need to know is that there have been multiple definitions of 1NF over the years,and these definitions sometimes conflict with each other. This may well be the source of your confusion, or at least some of it.

A useful thing to know is what purpose Ed Codd had in mind when he first defined it. Ed Codd defined First Normal Form, which he called Normal Form, back in the paper he published in 1970. His purpose in that paper was to demonstrate that a database built along relational lines would have all the expressive power that existing databases had. Existing databases often dealt with a parent that owns a set of children. For example, if the parent data item contains data about a student, each child might contain data about one course the student is taking.

You can actually define such a structure in terms of mathematical relations by allowing one of the attributes of a relation to be itself a relation. I'm going to call that "nesting" relations, although I don't recall what Ed Codd called it. In defining the relational data model, which is closely patterned after mathematical relations, Ed Codd wanted, for a variety of reasons, to forbid such a structure. his reasons were mostly practical, to make it more feasible to build the first relational database.

So he devoted some of his paper to proving that you could limit attributes to "simple" values without reducing the expressive power of the relational data model. I'm going to sidestep what "simple" means for the moment, although it's worth coming back to. He called this limitation "normal form". Once a second normal form was discovered, normal form got renamed to first normal form.

When it came time to build a relational database the engineers decided on a data structure called a "table". (I don't know the actual history, but this is approximate). A table is a logical structure made up of rows and columns. It can be thought of as an array of records, where each record represents a row, and all the records have the same header.

Now, if you want such a structure to represent a relation, you have to throw in a restriction that will prevent two rows with exactly the same values. If you had such duplicates, this would not represent a relation. A relation, by definition, has distinct elements. This is where primary keys come in. A table with a primary key can't have duplicate rows, because it can't have duplicate keys.

But I'm not done yet. You didn't ask this, but it has come up a thousand times in stack overflow, so it's worth putting in here. A designer can defeat Ed Codd's original intent by creating a column that contains text that, in turn contains comma separated values. In Codd's original formulation, a list of values is not "simple".

This is enormously appealing to the neophyte because it looks simpler and more efficient, to store a table with comma separated values than to create two tables one for parent records and the other for child records, and to join them when they are both needed for one query. Joins are not simple to the neophyte, and they do take some computer resources.

The CSV in a column design turns out to be an unfortunate design in nearly every case. The reason is that certain queries that could have been done real fast via an index now require a full table scan. This can turn seconds into minutes or minutes into hours. It's much more expensive than a join.

So you have to teach the newbies why keyed access to all data is a good thing, and this means you have to teach them what 1NF is really all about. And this can be as hard as teaching a five year old. Newbies are typically less ignorant than five year olds, but they tend to be more stubborn.

回答3:

First Normal Form is mostly a matter of definition rather than design. In a relational system, the data structures are relation variables. Since a relation always consists of unique tuples a relation variable will always have at least one candidate key. By convention we call one key per relation a "primary" key so in a relational database the primary key requirement is always satisfied.

Similarly, in a relational database all attributes contain values which are identifiable by name, not by positional index and so the issue of "repeating groups" does not apply. The concept of a "repeating group" exists in some non-relational systems and that was what Codd was referring to when he originally defined 1NF.

However, problems of the interpretation of 1NF arise because most modern DBMSs are not truly relational even though people try to use them like relational systems. Since SQL DBMSs are not relational, how are we to interpret relational concepts like 1NF in a SQL DBMS?

The essense of 1NF is that each table must have a key and that tuples consist of single values for each attribute. Most SQL-based systems don't support the concept of "repeating groups" (multiple values in a single attribute position) so it is usually safe to say that if a SQL table has a key and does not permit nulls in any attribute position then it is "relational" and satisfies the spirit of 1NF.

回答4:

You are only quoting a fragment of the text Database Administration Fundamentals. A more complete quote is:

The first normalized form (INF) means the data is in an entity format, which basically means that the following three conditions must be met:
• The table must have no duplicate records. [...]
• The table also must not have multivalued attributes, meaning that you can't combine in a single column multiple values that are considered valid for a column. [...]
• The entries in the column or attribute must be of the same data type.

(The history of the term "1NF" is full of confusions, vagueness and changes. But here's what this text says.)

回答5:

Let me join the party ;)

For a question "is this relation in 1NF" to have a meaning, you first need a relation. And for your table to be a relation, you need a key. A table without any keys is not a relation.

Why? Because relation is a set (of tuples/rows) and a set cannot contain same element more than once (otherwise it would be multiset), which is ensured by a key.

Once you have a relation by having a key, you can see if all your attributes are atomic, and if they are, you have yourself a 1NF.

So the answer to...

Creating a primary key satisfies the first normal form. True or False?

...is False. You do need a key, but you also need atomicity.

来源：https://stackoverflow.com/questions/28495501/explain-like-i-am-five-how-a-primary-key-satisfies-first-normal-form

标签

database

database-design

database-normalization