How to best manage historical lookup values in a database?

问题

Overview

An incident database that will have a number of columns holding an ID for a record held in a lookup table.

The problem I'm trying to solve

I need to come up with a robust solution to manage historical data where some fields hold lookup IDs. I've listed my proposed solutions as well as alternatives. I would like to know from other developers if they manage these scenarios in a similar way in their projects. Perhaps you have a better approach?

Database: Oracle 10g

Column: Department_name

Scenario: The department name can change X amount of times through the year. The business have a need to report data for all their departments but wish to see incidents under their respective department names as it was at the time of the incident.

Proposed solution: When setting up an entry in the department name lookup table, set a start and end date value. Using a view, create a calculated field based on the incident date to access the correct department name at any given point in time.

Pros: With a little bit of defensive coding it would enable self service by selected users to manage their static data via a GUI without any additional database changes. On the fly changes can be done e.g. changing the name completely. No DBA support is required.

Cons: Potentially an expensive operation given the volume of lookups/calculations being done over a large dataset.

Alternative solution: Simply use and insert the plain text value of the department name. The drawbacks here would be that DBAs are needed for adhoc requests to change/update values, potentially targeting specific date ranges and missing some records in error. There would also be increased table space consumption.

Column: Assigned_Technician_ID

Scenario: An incident will have one technician assigned where the ID of the technician will be stored. A lookup table will hold a 'current' list of all available technicians. As people leave the business the list has to be refreshed and obsolete technicians removed. This is to keep the number of values in the dropdowns to a minimum. The business will still want to see which technicians were assigned on all of their incident data.

Solution: Instead of deleting an entry from the technician lookup table, mark the entry with a flag which denotes 'archived/deleted'. This flag would act as a filter on the GUI dropdowns to remove unwanted entries.

Pros: The lookup table would only consist of the technician's UID from the employee table. Therefore if business requirements change it would be easy to render any attributes of a technician in the main view e.g. full name or employee number and so on.

Cons: As in the previous example the lookups could potentially be an expensive operation on a large dataset. Additional work would be required on the GUI side in regards to business logic and design. Specifically how to manage dropdowns lists when the original entry has been 'archived'.

Alternative Solution: As in the previous example above, just use the plain text value. Drawbacks here would be greater consumption of table space and less flexible with changing business requirements.

回答1:

There is a technique called versioning that has been around for many years but is largely unworkable for several reasons. However, there is a similar technique I call Version Normal Form which I have found to be very useful. Here is an example using an Employees table.

First, the static table is created. This is the main entity table and it contains static data about the entity. Static data is data that is not expected to change during the life of the entity, such as birth date.

create table Employees(
  ID        int  auto_generated primary key,
  FirstName varchar( 32 ),
  Hiredate  date not null,
  TermDate  date,            -- last date worked
  Birthdate date,
  ...              -- other static data
);

It is important to realize there is one entry for each employee, just as with any such table.

Then the associated version table. This establishes a 1-m relationship with the static table as there could be several versions for an employee.

create table Employee_versions(
  ID         int   not null,
  EffDate    date  not null,
  char( 1 )  IsWorking not null default true,
  LastName   varchar( 32 ),    -- because employees can change last name
  PayRate    currency not null,
  WorkDept   int   references Depts( ID ),
  ...,              -- other changable data
  constraint PK_EmployeeV primary key( ID, EffDate )
);

In the version table note there is an effective date but not a matching no-longer-effective field. This is because once a version takes effect, it stays in effect until replaced by the subsequent version. The combination of ID and EffDate must be unique so there cannot be two verions for the same employee that are active at the same time, nor can there be a gap between the time one version ends and when the next version starts.

Most queries will want to know the current version of employee data. This is provided by joining the static row for the employee with the version that is in effect now. This can be found with the following query:

select  ...
from    Employees e
join    Employee_versions v1
    on  v1.ID = e.ID
    and v1.EffDate =(
        select  Max( v2.EffDate )
        from    EmployeeVersions v2
        where   v2.ID = v1.ID
            and v2.EffDate <= NOW()
    )
where  e.ID = :EmpID;

This returns the one and only one version that started in the most recent past. Using the inequality <= in the date check (v2.EffDate <= NOW()) allows for effective dates in the future. Suppose you know a new employee will start on the first day of next month or a raise in pay is scheduled for the 13th of next month, this data can inserted ahead of time. Such "preloaded" entries will be ignored.

Don't let the subquery get to you. All the search fields are indexed so the result is quite fast.

There is a lot of flexibility with this design. The query above returns the latest data of all employees, present and past. You could check the TermDate field to get just present employees. In fact, since a good many places in your apps will only be interested in the current info of current employees, that query would make a good view (omit the final where clause). No need for the apps to even know such versions exist.

If you have a particular date and you want to see the data that was in effective at that time, then change the v2.EffDate <= NOW() in the subquery to v2.EffDate <= :DateOfInterest.

More details can be found in a slide presentation here and a not-quite-completed document here.

To show off a little of the extensibility of the design, notice there is a IsWorking indicator in the version table as well as a termination date in the static table. When an employee leaves the company, the last date is inserted in the static table and a copy of the latest version with IsWorking set to false is inserted into the version table.

It's fairly common for employees to leave a company for a while then get hired again. With just the date in the static table, the entry can be activated again just by setting that date back to NULL. But a "look back" query for any time when the person was no longer an employee would return a result. There would be no indication that they had left the company. But a version with IsWorking = false when leaving the company and IsWorking = true when returning to the company will allow a check of that value at the time of interest and ignore employees when they were no longer an employee even if they returned later.

回答2:

I am a SQL Server Developer and run into these issues all of the time. I am never a fan of using the text (denormalizing) unless I am loading data to a Data Warehouse.

Re: Department_name

Effective date is likely the best answer, what is giving me pause is that I am not sure I understand the question. I can't think of a business case for a department to change names so frequently.

Re: AssignedTech

I use an active flag in almost all of my employee based lookups. I have never had a performance issue doing this. I have used filtered indexes and views when dealing with companies with high turn over.

来源：https://stackoverflow.com/questions/47478331/how-to-best-manage-historical-lookup-values-in-a-database

标签

database

Oracle

database-design