Design of a data warehouse with more than one fact tables

前端 未结 3 1932
不思量自难忘°
不思量自难忘° 2020-12-29 07:07

I\'m new to data warehousing. First, I want to precise than my copy of The Data Warehouse Toolkit is on it\'s way to my mailbox (snail mail :P). But I\'m already studying al

相关标签:
3条回答
  • 2020-12-29 07:39

    You can have as many fact tables as you like. In your example you may have something like:

    dimProduct lists several products -- subscription being one of those. dimTransactionType would list possible transactions (purchase, refund, recurring subscription fee ...)

    Now suppose you are interested in simplified subscription reporting, you could add a factSubscription like this:

    0 讨论(0)
  • 2020-12-29 07:44

    I realize that I am answering an old post, but I am not satisfied with either of the answers provided. I feel that neither answered the question.

    A schema can have one or more facts, but these facts are not linked by any key relationship. It is best practice not to join fact tables in a single query as you would whey querying a normalized/transactional database. Due to the nature of many to many joins, etc - the results would be incorrect if attempted.

    The answer you are looking for is that you need to "drill across" which basically means that you are querying each fact table (schema) separately and merging the results. This can occur using SQl or preferably via a reporting/analytics tool that you may have which referenced the data warehouse. Instead of duplicating the answers on how to do this, I will direct everyone to two very good articles:

    Three ways to drill across by Chris Adamson

    and

    Should of the Warehouse - Drilling Across by Ralph Kimball

    0 讨论(0)
  • 2020-12-29 07:46

    Taking your questions backwards.

    A data warehouse can have more than one fact table. However, you do want to minimize joins between fact tables. It's ok to duplicate fact information in different fact tables.

    Of the objects you mentioned:

    Refund is a fact. Timestamp is the dimension of the refund fact.

    Subscription fee is a fact. Timestamp is the dimension of the subscription fee fact.

    A refund can happen more than once. I'm guessing that each customer has one subscription fee. So it appears we have two fact tables so far, customer, and customer refund.

    If you knew that there could only be at the most 3 refunds (as an example), then you would eliminate the customer refund fact table, and put 3 refund columns in the customer table.

    You also mention insurance. A customer can have more than one policy. So we have a third fact table.

    A data warehouse is usually designed using a star schema. The star schema is basically one fact table connected to one or more dimension tables. You'll probably have more than one star in a data warehouse, since we already defined 3 fact tables.

    0 讨论(0)
提交回复
热议问题