Elegant normalization without adding fields, extra table. Best relationship

半城伤御伤魂 提交于 2019-11-27 04:54:55

问题


I have 2 tables I am trying to normalize. The problem is I don't want to create an offhand table with new fields, though a link table perhaps works. What is the most elegant way to convey that the "Nintendo" entry is BOTH a publisher and a developer? I don't want "Nintendo" to be duplicated. I am thinking a many-to-many relationship can be key here.

I want to stress that I absolutely want the developer and a publisher tables to remain. I don't mind creating a link between the 2 with a new relationship.

Here are the 2 tables I am trying to normalize:

Below is a solution I tried (I don't like it):


回答1:


I think you want something like this:

Game_Company
ID    Name
 1    Retro Studios
 2    HAL Laboratories
 3    Nintendo
 ...

Company_Role
ID    Name
 1    Developer
 2    Publisher
 ...

Game_Company_Role
CompanyID    RoleID
        1         1
        2         1
        3         1
        3         2
 ...

To get a list of all companies that have role 'Developer':

SELECT gc.name
FROM Game_Company gc JOIN Game_Company_Role gcr ON gcr.CompanyID=gc.ID
WHERE gcr.RoleID = 1



回答2:


There is nothing wrong with your two tables.

In fact all you need is

developer(name) -- company [name] is a developer
publisher(name) -- company [name] is a publisher

Your changes have nothing to do with normalization. Normalization never creates new column names. 'I don't want "Nintendo" to be duplicated' is misconceived. There is nothing wrong per se with values appearing in multiple places. See the answers by sqlvogel & myself here.

BUT: Depending on what it means for a row to be in one of your tables there might be a better design to reduce errors because the two tables' values could be "constrained" ie depend on each other. That has something to do with "redundancy" but it is about constraints and does not involve normalization. And for us to address it you have to tell us exactly when a row goes into each table based on the world situataion.

If you don't want to repeat the strings for implementation(-dependent) reasons (space taken or speed of operations at the expense of more joins) then add a table of name ids and strings (actually company ids and names) and replace your old name columns and values by company id columns and values. But that's not normalization, that's complicating your schema for the sake of implementation-dependent data optimization tradeoffs. (And you should demonstrate this is needed and works.)

The accepted answer just adds a lot of redundant data. Just like your question adds three redundant tables. The two tables already say what companies are developers and which are publishers. The other tables are just views/queries on the two!

If you want a new table for "[id] identifies a company named [name] with ..." then this is a case of developers and publisher as subtypes of supertype company. Search on database subtypes. See this answer. Then you would use company id instead of name to identify companies. You could also then further simplify (!) by using company id as the only column in tables developer and publisher and also everywhere else instead of developer_id and publisher_id.

"Redundancy" is not about values appearing in multiple places. It is about multiple rows stating the same thing about the application. When using a design like that there are two basic problems: to say certain things multiple rows are involved (while the normalized version involves just one row); and there is no way to say just one of the things at a time (which normalization can help with). If you make two different independent statements about Nintendo then you need two tables and Nintendo mentioned in each one. Re rows making statements about the application see this. (And search my other answers re a table's "statement" or criterion".) Normalization helps because it replaces tables whose rows state things of the form "... AND ..." by other tables that state the "..." separately. See this and this. (Normalization is commonly erroneously thought to involve or include avoiding multiple similar columns, avoiding columns whose values have repetitive structure and/or replacing strings by ids, but although these can be good design ideas they're not normalization.)


In comments, chat and another answer you gave this starting point:

Here's the simplest design. (I'll assume game titles are not unique so you need game_ids.)

-- game [game_id] with title [title] released on [release_date] is rated [rating]
game(game_id,title,release_date,rating)
game_developer(game_id,name) -- game [game_id] is developed by company [name]
game_publisher(game_id,name) -- game [game_id] is published by company [name]
game_platform(game_id,name) -- game [game_id] is on platform [name]

Only if you want a separate list of companies so that a company can exist without developing or publishing and/or can have its own data do you need to add:

company(name,...) -- [name] identifies a company

Only if you want role-specific data for developers and publishers do you need to add:

developer(name,...) -- developer [name] has ...
publisher(name,...) -- publisher [name] has ...

The relevant foreign keys of the various options are straightward.

None of your versions need _ids. Your versions 2 & 3 won't work because they don't say what companies develop a game or what companies publish a game. You don't need roles but if you have them (Verison 2) then you need a table "game [game_id] has company [name] as [role]". Otherwise (Verision 3) you need tables for "[game_id] is developed by company [name]" and "game [game_id] is published by company [name]". Wherever you differ from my designs ask yourself why you have additional structure and why you can do without it and (possibly) why you would explicitly want it anyway.




回答3:


This is a bit generic approach to the problem, it may be of interest. As @Dour High Arch has pointed out in his solution, the Developer and Publisher are just roles for a 'party'. Each part has 0,1 or more roles with a given product and roles may overlap.This is good and bad. For example, a product may be developed by 5 developers but published by at most 1 publisher. I have chosen to introduce a serial_id as system generated PK, but this is not mandatory. You could use the 3FKs as a PK and not user the serial_id.

Notice that having a party as a generalization of different entity types is not always good since 1 or more columns will have to be set to not mandatory if it is not common to all parties, however, this is very common in real applications.

Convention:

name_PK = Primary Key,

name_FK = Foreign Key




回答4:


Here are three final solutions as proposed by the comments. You can see the table being broken down from the top "un-normalized" table.

The rules are as follows:

  • 1 game can have 1 or many developers and 1 developer can have 1 or many games.
  • 1 game can have 1 or many publishers and 1 publisher can have 1 or many games.
  • 1 game can have 1 or many platforms and 1 platform can have 1 or many games.

Version 1

I left the 2 "Nintendo" entries in red. According to research and implementation, this is not technically redundant data. See my comments under philipxy's answer. This looks simple and elegant. 4 tables with a many-to-many relationship.

Here is the relationship diagram (4 tables and 3 link tables):

Verison 2

Version 1 "repeats" "Nintendo" but Version 2 has a "Company" table instead. Compare the 2 different versions. What is the right way?

Version 3

Here is the subtyping philipxy was talking about. How is this version?



来源:https://stackoverflow.com/questions/27750376/elegant-normalization-without-adding-fields-extra-table-best-relationship

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!