Understanding Normalization & Duplicates - I Guess I Don't - Adding Artist & Title Ids [closed]

一世执手 提交于 2019-11-26 23:33:59

问题


I began with a table listing the top 100 songs by date for the years 1958 through 1980. For each date, there are 100 records. Obviously many will be duplicates as a song changes position from week to week. Also, the artists will be duplicated (think Elvis) numerous times. There are ~ 116,000 records in the table.

This table had the following fields

uniq,
date,
artist,
title,
position

To eliminate duplicates (normalization as I understand it) I have modified the table so that it now looks like this

uniq,
date,
artistcode,
titlecode,
position

And have two new tables artists and titles. Artists looks like this

artist,
artistcode

And titles looks like this

title,
titlecode

To get started in the right direction, I simply want to reassemble (join) these tables so that I have a view that looks like the original table, ie

uniq,
date,
artist,
title,
position

and has those 116000 records. After reading a couple of books and working with several tutorials, I have come to the conclusion that I have a misconception of what normalization should do, or I am simply headed in the wrong direction.

The SQL syntax to create the view would be much appreciated.


回答1:


To get back to the original output with the multiple tables, you can use the following syntax with JOINs

SELECT s.uniq, s.date, a.artist, t.title, s.position
FROM songs AS s
JOIN artists AS a ON a.artistcode = s.artistcode
JOIN titles AS t ON t.titlecode = s.titlecode

If you are trying to eliminate duplicate song entries, you can add this to the query:

GROUP BY t.title



回答2:


What "duplicates"? There is nothing wrong per se with the same value appearing multiple times. You need to begin reading some academic textbook(s)/slides/course(s) about information modeling and relational databases.

Each row that is in or not in a table makes a statement about the situation. The sort of "duplicate" and "redundancy" problems that normalization addresses arise sometimes when multiple rows of the same table say the same thing about the situation. (Which might or might not involve subrow values appearing multiple times.)

Eg: If you had a table like this one but with an additional column and a given artist/title combination always appeared with the same value in that column (like if an artist never has multiple recordings with the same title charting and you added the playing time of each recording) then there would be a problem. ("... AND recording artist/title is time minutes long") If you had a table like this one but with an additional column and a value in it always appeared with the same artist/title combination (like if you added a recording id) then there would be a problem. ("... AND recording recordingcode is of title title by artist artist") Right now there is no problem. What do you expect as an answer? The answer is, normalization says there's no problem, and your impressions are not informed by normalization.

Normalization does not involve replacing values by ids. Introduced id values have exactly the same pattern of appearances as the values they identify/replaced, so that doesn't "eliminate duplicates", and it adds more "duplicates" of the ids in new tables. The original table as a view is a projection of a join of the new tables on equality of ids. (You might want to have ids for ease of update or data compression (etc) at the expense of more tables & joins (etc). That's a separate issue.)

-- hit `uniq` is title `title` by artist `artist` at position `position` on date `date`
/* FORSOME h.*, a.*, t.*,
    hit h.uniq is title with id h.titlecode by artist with id h.artistcode
        at position h.position on date h.date
AND artist a.artist has id a.artistcode AND h.artistcode = a.artistcode
AND title t.title has id t.titlecode AND h.titlecode = a.title
AND `uniq` = h.uniq AND `title` = t.title AND `artist` = a.artist
    AND `position` = h.position AND `date` = h.date
*/
/* FORSOME h.*, a.*, t.*,
    Hit(h.uniq, h.titlecode, h.artistcode, h.position, h.date)
AND Artist(a.artist, a.artistcode) AND h.artistcode = a.artistcode
AND Title(t.title, t.titlecode) AND h.titlecode = a.title
AND `uniq` = h.uniq AND `title` = t.title AND `artist` = a.artist
AND `position` = h.position AND `date` = h.date
*/
create view HitOriginal as
select h.uniq, h.date, a.artist, t.title, h.position
from Hit h
join Artist a on h.artistcode = a.artistcode
join Title t on h.titlecode = t.titlecode


来源:https://stackoverflow.com/questions/44530971/understanding-normalization-duplicates-i-guess-i-dont-adding-artist-tit

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!