How to implement historical versioning?

问题

We are in the early stages of building a large C# MVC2 app (we also employ Sharp architecture and Nhibernate as part of the ecosystem) on SQL 2008 R2, and one of the requirements is that all database row versions are accessible for a given period of history.

We have toyed with the idea of a layout similar to:

id (PK)
recordId
versionId

and having each edit to a record result in a new record being created with the same recordId and an incremented versionId. Record display would then be done with something along the lines of SELECT ... WHERE recordId = X AND versionId = MAX(versionId)

Snapshotting on every transaction would not work (too many? and not accessible from within the application easily).

But we are curious as to what other implementations have been tried with success, or potential problems with our proposal.

回答1:

You seem to be alluding to a temporal table. Three approaches:

Valid-state table: appending two 'timestamp' columns (e.g. of type DATETIME), one specifying when the row became valid and one specifying when the row stopped being valid, the intervening time is being the period of validity of the row

Transaction-time state table: associates with each row the period of time that row was present in the monitored table, thereby allowing the state of the monitored table at any previous point in time to be reconstructed.

Bitemporal table: capturing both valid time and transaction time, simultaneously records the history of the enterprise, while also capturing the sequence of changes to the record of that history.

Source: Developing Time-Oriented Database Applications in SQL (Richard T Snodgrass).

回答2:

We have a system developed by our DBAs that works as a trigger on update/delete. There is a secondary table that nearly mirrors the table being audited (aside from some other details like transaction time, login used to do the update, server, etc.). Anytime someone makes a change, it is logged in the audit version of the table via the trigger. It's kind of annoying to have to keep the audit trigger up to date anytime the schema changes, but c'est la vie.

The good thing about this is that applications do not need to be concerned with this auditing at all ... so it keeps the concept count for application code low(er).

This is in production and works on tables where transactions number in the tens of thousands per day. Your mileage may of course vary based on a number of things like the size of your server, and the nature of your data, but it works for us :-)

回答3:

Rather then versionId which requires you to do self-join to get max version, I would introduce the validFrom - validTo pair. This requires you to update (end) current record's validTo when inserting new version of row, but allows to easily select current data with where @now >= validFrom and @now < validTo or the data in any historical time.

You can either have these historical records in a separate table or not. If you want to have only one table which includes all versions of row, which is better maintainable and usable, you may want to have a look at SQL Server Partitioning (Partitioned Tables) which allow you to separate recent history from old history and optimize searching on it.

回答4:

I had a similar problem where i need to audit every change to a set of tables.

The biggest problem we had was the performance when trying to use only NHibernate features to manager massive inserts and updates (because the UI required them). So our solution was to use TRIGGERS to audit all the information on the tables and the response time is incredible comparing any solution we could came up on that time with NH.

If someone ask me how to do it, I would say triggers is the way to audit your data.

回答5:

If you have SQL 2008 Enterprise, and depending on your intent, Change Data Capture (CDC) could be worth a look.

It really depends whether you are retaining previous versions for audit purposes, or for some other reason.

回答6:

My enterprise also uses a trigger based approach for both auditing and history. In addition to the core table in the enterprise warehouse, every significant table has an audit table in a separate database. This audit table has 1 row for every transaction. Some of our tables also have a historical version in a third database. The Audit database is purely for after-the-fact trouble shooting and non repudiation -- but it is difficult and non-performant to query for data analysis. Our history database is optimised to answer point-in-time queries very efficiently. All of this is 100% scripted by a .net tool I wrote, so when we alter the schema or add new tables to history, we just re-script the affected triggers.

来源：https://stackoverflow.com/questions/3208064/how-to-implement-historical-versioning

标签

sql

nhibernate

asp.net-mvc-2

s#arp-architecture