database normalization - merge/combine tables

问题

Please consider the following scenario.

We have a 0NF table

StudentTeacherTable:

StudentName StudentDepartment StudentDepartmentAdd TeacherName TeacherDepartment TeacherDepartmentAdd
    John          CS                  London           Dave        Eng, CS             Oxford
    Mike          CS                  London           Dave        Eng, CS             Oxford
    Chris         Eng                 Oxford           Dave        Eng, CS             Oxford

Ideally after normalization I would like to have tables like

Student Table:

StudentName Department TeacherName
    John        CS         Dave
    Mike        CS         Dave
    Chris       Eng        Dave

Teacher Table:

Name 
Dave

TeacherDepartment Table:

TeacherName DepartmentName
     Dave         CS
     Dave         ENG

Department Table:

   Name Address
    CS   London
    ENG  Oxford

However, if I follow normalization to the 3NF. I will get

Student Table:

StudentName Department TeacherName
    John        CS         Dave
    Mike        CS         Dave
    Chris       Eng        Dave

DepartmentForStudent Table:

   Name Address
    CS   London
    ENG  Oxford

Teacher Table:

Name 
Dave

TeacherToDepartment Table:

TeacherName DepartmentName
     Dave         CS
     Dave         ENG

DepartmentForStudent Table:

   Name Address
    CS   London
    ENG  Oxford

My question is that in which step in database normalization (1NF,2NF,3NF etc) I can merge/combine the studentDepartement with teacherDepartment columns into one table to derive the normalized form above?

In other words, following normalization rules. I will end up having a StudentDepartment table and a TeacherDepartment table rather than one Department table for both Student and Teacher

回答1:

Your question has nothing to do with normalization. You are asking the question, if of if not to physically join tables of similar types and same sets of attributes. Normalization has no preference in that matter. And basically there is no wrong or right. This is more about balance trade-offs according to a specific design setup:

option 1: have multiple tables (as you did show in you example): pros: - explicit database design -> easy to read - lower memory/disk space need as no type column is needed

cons: - when using surrogate or other no-natural keys: no unique cross table identifier which may make potential upcomming needs for change hard to manage - viewing accross all tables requires lots of unions (expecially if more than two tables)

option 2: have one table with an additional type column: pro's and cons in opposite direction of option 1

G*** may find you lots of resources to that topic.

2 examples: Storing hierarchical data (e.g. single table with type vs multiple tables with 1:1 key and differences...) in Relational Database Design Patterns?

http://sqlmag.com/sql-server/trouble-type-tables

回答2:

You write "Ideally after normalization I would like..."

this suggests you have been given the solution, as to an exercise. Always be careful about retro-fitting any work to a pre-set solution; in the case of normalisation, which depends on / helps to reveal relations between elements of data, you should be very circumspect about the assumptions underlying one, or another, solution.

That said, let's try and resolve this, bearing in mind that a set of normalised tables is your result, but normalisation is a process: more precisely, producing the 1, 2, the 3NF in that order, from a small data sample, is a precise process, which is often practiced when learning to normalise.

First, let us list the attributes involved. At this point, I'll add surrogate keys that are clearly needed for this data, and identify them with ID:

StudentID
StudentName
StudentDepartment
StudentDepartmentAdd
TeacherID
TeacherName
TeacherDepartment (repeating)
TeacherDepartmentAdd

Your data is confusing because the sample is small, and there are few cues as there might be in a filled in form or report. But I believe that I can make two assumptions: (1) The teacherDepartment is dependent on teacher, as the name suggests; (2) each teacher (like Dave in the data) has many students withing each department where they work. If this is the case, then "studentdept" and "teacherdept" are best processed as one attributes, the two columns help simply work out the dependencies.

Under these two assumptions, the process becomes familiar, only there are two levels of repeating groups:

      UNF                     1NF                   2NF (and 3NF)

  _TeacherID_            _TeacherID_           _TeacherID_
   TeacherName            TeacherName           TeacherName
   TeacherDepartmentAdd   TeacherDepartmentAdd  TeacherDepartmentAdd
|  Department 
|| StudentID             _TeacherID_*          _TeacherID_*
|| StudentName           _Department_          _Department_
|| StudentDepartmentAdd
                         _TeacherID_  )*       _StudentID_*
                         _Department_ )        _Department_
                         _StudentID_            TeacherID *
                          StudentName         
                          StudentDepartmentAdd _Department_
                                                StudentDepartmentAdd

                                               _StudentID_ 
                                                StudentName

Two more assumption are needed: that the student and department determine the Teacher; and that the department determines the department address (where that department teaches). These aren't at all certain from the small data sample, but I accept them on the basis of the result you said you should obtain. In any real situation, you would ask for a larger data sample, or confirm the structure of the data with its actual users. On that basis, the 3NF is the same as the 2NF, so I do not write above.

So the data given is compatible with the results you are looking for. But, you should understand:

Normalisation is not normally done from such incomplete information. Here, to arrive at the expected result, we have to assume many things to compensate for the absence of real data.
The purpose of this process is to identify the correct choice of determinants, but it doesn't replace reasoning about the sensible determinancy relationships within your data. Again, this is obvious from this case, and the limited information given by the data sample.

来源：https://stackoverflow.com/questions/29457166/database-normalization-merge-combine-tables

标签

database

relational-database

database-schema

database-normalization