问题
I have a table called Bookings. This table contains data representing a booking made for a particular service, with many variables.
A while ago I came across a problem with my current data structure whereby any changes to the booking that affected times, dates or prices would affect other associated financial records, bookings lists for dates etc.
My solution at the time was to create a Modifications table which would track any changes made to a Booking. Then, whenever the Booking model was asked to return a booking, it would add on Modifications made (in the afterFind()
Cake callback) and present the most up-to-date version of the booking, something like this (excuse the Paint drawing):

This method works fine when you ask the Booking model to return booking #1234. It returns the most up-to-date representation of the booking including all modifications (layered on top of each other), including an array containing all the modifications and the original booking data for reference.
My problem is that I've recently realised that I need to be able to query this model with custom conditions, and if one of those conditions was realised in one of the modifications, the result wouldn't match because the model is searching the original record rather than the finally presented record. Example where I query the model to return rows where abc
is blue (not grey):

In that example, the model looks straight at the original data for rows where abc
is blue and doesn't return this result, because the blue value is in a Modification which is attached after the original results are found.
What I've done now is put a query into the beforeFind()
callback of the Booking model to look for modifications that match the given criteria, joining the booking to make sure that any other criteria still matches. When it returns the blue in example above, it stores that result in an array as a class property and continues with the regular find()
, but excludes that booking's ID from being returned (because we've found a more up-to-date verison of it). Then it'll merge them together, sort them again etc in the afterFind()
.
This works, although it's a little more long-winded that I was hoping for.
After all that, I've realised that in other parts of this application, there are models that are manually joining to the bookings table and searching for bookings. So now I need a way to be able to incorporate the modifications into all of those manual joins straight to the table in MySQL without affecting the original data and preferably without changing too much of my code.
My thoughts were that I need to remove the manual join and create a model association instead. Will the beforeFind()
and afterFind()
of the Booking model still run when I query say the Customer model which hasMany Bookings (to apply the modifications to each booking)?
My other option was to return more rows from MySQL than necessary by removing any criteria that might be contained in the modifications, then use PHP to filter the results as per my search criteria. This option scared me a little because the result set has the potential to be massive without that criteria...
How can I achieve this data structure? My key requirements are still that I do not want to change the original Booking record, rather add Modification records on top, but I need to be able to query bookings (including modifications) through the model.
I want to try and keep as much of this integration behind the scenes as possible so I won't have to go through my entire application to change n
number of queries that look like this:
$get_blue = $this->Booking->find('all', array(
'conditions' => array(
'Booking.abc' => 'blue'
)
));
I want to be able to implicitly include any modifications made to bookings so that the up-to-date booking will be returned in the above query.
The other problem is when the Booking model is manually joined to a search query, like this:
$get_transactions_on_blue_bookings = $this->Transaction->find('all', array(
'joins' => array(
array(
'table' => 'sql_bookings_table', // non-standard Cake format, I know - it's an example
'alias' => 'Booking',
'type' => 'LEFT',
'conditions' => 'Booking.booking_id = Transaction.booking_id'
)
),
'conditions' => array(
'Booking.abc' => 'blue'
)
));
As you can see, the above query won't include the modification in my MSPaint example above, because it's manually joining the table in SQL (the modification integration is in the before
and afterFind()
callback functions of the Booking model).
Any help on this would be greatly appreciated.
Edit
I know this is long enough already, but I thought I'd add that the reason I want to track these changes and not update the original record is that the financial aspect can't change, because it will affect reporting.
The quickest and easiest solution I can see so far is to apply modifications directly to the original booking in all cases except when it affects financial information, which is still tracked as a modification (because I don't currently need to search based on this info).
回答1:
It sounds like you're trying to implement a Temporal Database. Temporal support was one of the major additions to the ANSI/ISO SQL:2011 standard. MySQL (like most RDBMS) lags behind the standard. Think of Temporal Database as the DBMS equivalent of CVS/SVN/Git.
By contrast, the traditional database we use without temporal features can be called a Current Database.
In a Current Database, if you try to implement temporal support, you can fail in many ways with different approaches:
The one-table approach. When you need to make modifications, you do
UPDATEs
on your original records, and unless you have some sort of homegrown trigger/audit logic, the history trail is absent. Even if you have an audit/change log, you'd have to do some ugly digging to reconstruct the change history.The two-table approach. Instead of making modifications in-place, you split out your data into two tables, one with the base/original records (e.g. booking), and another table for your changes/modifications/deltas. Then at least you have your original data preserved, but again you have to write complex logic to view the original data with modifications layered on. It gets even worse if you want only some of the modifications applied.
The precalculated resultant table approach. You keep 3 or more tables: the base records, the modifications, and also a table which attempts to always have the resultant (keeps up to date the base + modifications). Good luck writing the triggers and procedures to do this calculation whenever you do
INSERTs
, and Heaven help you if anUPDATE
orDELETE
is needed. The setup is fragile and could break out of sync, such as deadlocks & rollback. If you don't do this within the DB with triggers/procedures, you could try to implement resultant calculation it in the application code, but have good luck at that -- and it could get ugly with multi-threaded consumers. And still, you don't have easy access to resultants with only some modifications applied.
Conclusion: If you're not limited to MySQL, you should really consider using a DB that has built-in temporal support. Otherwise, you're going to re-implement the wheel.
回答2:
Instead of applying the modifications to the original record, what if you did the reverse and applied the original record to the modifications? You could modify the modifications table (or a new table) to hold the original record with the modifications applied to it, and direct your searches there.
Another thought is that if the financial data is all that needs to be preserved, why not save it in another field or table and reference it when you need it? I agree that a re-design is probably the best/smartest approach for a long-term solution, but I figured I'd put my ideas on the table in case they can help.
回答3:
What if you used a backup table to store data from the original table before modifying the original table? you could then use a rollback function to restore data to a previous state.
Here is a flowchart of my database update process theroy: http://i1371.photobucket.com/albums/ag300/joshua127/BookingFlowchartinsert_zps5c2d55f8.png
Here is a flowchart of my selection process theroy: http://i1371.photobucket.com/albums/ag300/joshua127/BookingFlowchartselect_zps702fa902.png
Hope this helps, just another way to look at it.
P.S. To keep the financial information unchanged, you could write your update functions to count the number of columns to be updated (based on your update array of column names) and provide variables to hold specific values for those columns alone. you could reference the array indexes ($array['index']) in the SQL statement to make it dynamic.
回答4:
It seems to me, what you need is a kind of history of a table so that you are able to know what happen in the time.
I usually achieve such an approach by creating a parallel table called like the original appending _history
to it. Bookings_history
in your case. The structure would be similar to the original but prepending the columns:
a) timestamp
, that save when the modification happened
b) id
, to identify the row in the original table
A unique index on this two columns would be created.
Each time a modification happens, before applying the modification you copy the original row to the history table. Then you apply the modification on the original table. Doing so, the history table acts like a stack where you save snapshots of the original data.
I specially like this model because joining tables and applying search engines on the history table can be done in a similar way as you did for the original table, because the structure is quite similar. Also, if you want to know about modifications, you just need to compare rows of the history table.
I hope this helps.
回答5:
From the answers you gathered already it's pretty apparent that whatever you do, it will require some or more redesign.
One of the solutions I don't see yet and that I've used in the past to solve such problems (i.e. orders that are changed) is to keep everything in the same table and use field(s) to differentiate them.
You can change the bookings
table to add an incremented integer per booking (i.e. version_number
) and a is_latest
field. This way you can query with is_latest=true
to get the current record and its version_number
. If it is 0, there were no changes, if it is >0 then there are changes (that number will equal the number of changes). You will be able to "rewind" or "replay" the history if you go from latest version to 0 or the opposite way round and each time you'll be having a complete record that your app understands without modifications.
If is_latest
is indexed the querying speed will (almost) equal the original table querying speed and of course you can add more booleans like is_original
if you need to get the original booking many times.
This has the benefit that it will most probably require you to change only the Booking
model but that will depend on your code.
EDIT: I believe that this approach will be most compatible with your requirement about reporting and financial record as you will always have the original record (version 0) easily available.
来源:https://stackoverflow.com/questions/24297338/effective-management-of-data-changes