Can't decide whether normalization or de-normalization would work

问题

this question may be a bit broad!

Recently I am learning MySQL. I am making a school exam information database. I have created a table called subjects where information regarding various subjects are stored including full name or full marks for each subject. I have also created another table called exams where marks obtained by each student on each subject is stored. Obviously subjects is the master table and exams is the child table here. Based on the data stored in these 2 tables, when joined together, information like percentage and grade for each student is generated.

But what if, the certain information in the subjects table changes in future. Say full marks get changed for a particular subject. It that case the older records in the exams table will be invalid or wrong because now the join will produce new values which are correct for current records but wrong for older records.

What should I do in this kind of situation? Do I denormalize both tables into a complete table? But my timid knowledge in database design says that is wrong practice!

Any help or insight will be highly appreciated.

回答1:

You can add a new row to the subjects-table instead of changing an existing one when a subject changes. So, the subject - table contains versions of subjects. This can be combined with using validity dates. You will need a normalization step after applying the approach. This might be what is meant by one of the comments.

回答2:

You need to model your historical situation(s), just as you modeled your current situation.

You may or may not choose to denormalize past data. But the most benefit per effort actually involves having historical situation tables look like or actually be the current situation tables. This involves investing in further normalizing current rows into subrows that you then extend by a date so that you can join together subrows from the same date.

If duplicated data is/becomes a demonstrated burden then you can split up a table into multiple tables, for subrows that you want to date the most recent change to while other subrows remain dated as of an earlier change. Then you can join up rows that are the most recent that agree as of a given date. In some "temporal" databases we minimize redundant data by splitting rows into subrows and not just dating them but labeling them with a date range during which they were current.

From a recent answer of mine:

Proposals to hard delete assume that you keep desired historical data. Do not limit your thinking about achieving this to merely nulling FKs, cascading, adding a flag/date column to an extant table or anything else. Properly model both present & past including database changes that need to occur as a DBMS transaction upon each chosen application situation change. Proposals to soft-delete just involve putting certain current and historical data into the same table vs different ones. This only works for very simple models of current & historical situations.

It is usually straightforward to design a database for only a current application situation. But if we do care about the past we typically only care about some of it. If so, upon certain application situation changes from current to past we can copy a snapshot of the relevant current state into historical state. Labeling data with soft-delete flags vs dates is the combined-table version of undated vs dated historical data, where we only care about current vs past situations and we only care that vs when a change occurred.

"Temporal" databases more or less record the current situation and a bunch of dated once-current situations. This recording of past data using the structure for current data simplifies understanding & querying of current & past data. (The querying about intervals of time that a temporal database can facilitate can get quite complicated.) But it turns out that making a temporal version of a given current-data design does not just involve adding date columns to extant current-data tables. It requires remodeling current data, breaking it into smaller tables with more constraints. This is because different kinds of application situation changes require dating different column combinations of the extant current-data design. (Hard and soft historical snapshot designs must address this, but for a limited past/history.)

来源：https://stackoverflow.com/questions/40211539/cant-decide-whether-normalization-or-de-normalization-would-work

标签

mysql

database-design