Data warehouse schema: is it OK to directly link fact tables in DWH?

半世苍凉 提交于 2021-01-27 16:45:05

问题


Is it OK to directly link fact tables in DWH?

As I understand, in galaxy schema fact tables are not linked, they just have common dimension table. But, if there is a DWH schema that assumes to link them directly?


回答1:


IMO, they shouldn’t, even if they can. Fact tables are usually huge, with potentially many billions of rows, and hold measures at a certain grain.

Linking two or more fact tables may require joining several multi billion row tables which will be too expensive.

If you need to link facts in different fact tables (all dimensions are common) you’re better off doing the join only once, storing the results and using that resulting table instead. Even better if this can be done at ETL level, where you can join batch by batch.

If you join facts in two tables where one’s dimensions are a superset of the other’s, you’re better off aggregating the most granular facts to the other’s granularity and apply the solution above.

If neither set of dimensions is a superset of another then you may need to aggregate both at a common level.

The reason behind my position is that I’d rather have redundancy in storage and avoid query time calculations than have my users wait a long time for those joins to produce results. Also, very large joins need a lot memory which is normally more expensive than storage.

Finally, remember a DWH normally has data loaded by ETL processes. They run in batch and can check for consistency at each run, unlike OLTP where avoiding multiple writes of same data is paramount to prevent inconsistency.

Opinions on this differ and you’ll most likely get different views on the matter. In the end, both approaches have their pros and cons, study both and pick the one you’re most comfortable with.




回答2:


The answer is obvious NO, as per definition any table that is referenced via foreign key from a fact table is a dimension table.

On the other side in the Kimballs model, there is no strict dividing line between facts and dimensions - a table could play both roles based on a context.

So for example a table containing a service usage is a fact table with dimensions such as time, location, contract and so on.

But the contract itself may be modeled as a fact table, i.e. table containg the transactions that are changing the contract and with dimensions such as time, customer, rating model etc. (You may call it slowly changing dimension - but this is only alternative description to a fact table).

But the most important is, if your model connecting two "fact" tables describes the business well, is stable, easy to load, resistent to failure and supports performant reporting queries, than the answer is obvious YES, this is the rigth model.




回答3:


No, it's not OK to directly link fact tables.

First, if you correctly model your fact tables, you won't be able to link them in a meaningful way. The only exception is fact tables that have 1:1 relations, but then the question is - maybe they should have been modeled as one fact table to begin with.

Second, linking fact tables directly goes against the core idea behind dimensional modeling - that the model should reflect structure of the underlying business. Typically, in dimensional models fact tables represent specific business processes, and dimensions represent their contexts. That's the key difference between OLTP and dimensional databases - OLTP systems are optimized to efficiently and reliably capture transactions, whereas dimensional models are optimized to query data and make business sense out of it. It's a mistake to confuse these two concepts.



来源:https://stackoverflow.com/questions/52061383/data-warehouse-schema-is-it-ok-to-directly-link-fact-tables-in-dwh

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!