问题
I have a Datamart in SQL Server running in a WebApi server. There are a Fact table and multiple Dim tables.
The dimension values can grow dynamically, so, when the fact data is received, it is necessary to check the Dim table. If it does not exists, then I need to add the dimension value to the Dim table.
Finally I need to insert the fact record to Fact table, using the foreign keys to the Dim tables.
I need to repeat the process for each fact record. Which is the efficient way to add facts with dynamic dimensional data in a Datamart system?
回答1:
I've just answered something quite similar (https://stackoverflow.com/a/29433398/3964881), but I realise the two questions aren't the same, despite needing quite similar answers.
As I've noted in the other answer, the usual pattern is to populate your Dimension(s) first, and then your Fact(s). So for argument's sake, let's say you have a Person Dimension and a Town Dimension. You should have a process which pulls all of the relevant information about each Person from your source system(s), and loads any new ones into the Person Dimension. Likewise, another process should pull all of the Town information from your source system(s), and load any new ones into the Town Dimension.
Once that's done, your Fact load can run and can simply look up the surrogate key/ID values for Person and Town, with no need to check whether the value exists and add it if not. If your process is working correctly they will definitely already exist.
This is a much sounder solution in case you get to a point where you have multiple Fact tables referencing the same Dimensions - otherwise, you could end up in a situation where your Fact tables have to load in a certain order to ensure the Dimension data is in place, or you could even risk ending up with duplicated Dimension data where two Fact table load processes have both loaded in the same "new" data at the same time.
来源:https://stackoverflow.com/questions/29217709/efficient-way-to-add-facts-in-a-datamart