How do I dimensionally model this relationship in a Kimball-style data warehouse?

倾然丶 夕夏残阳落幕 提交于 2020-01-07 00:58:11

问题


So I have two dimensions in my data warehouse:

dim_machine
-------------
machine_key
machine_name
machine_type


dim_tool
------------
tool_key
tool_name
machine_type

What I want to make sure of is the machine_type field in both dimensions has the same data. Should I create a third dimension to snowflake between the two or is there another alternative?


回答1:


I'm not sure exactly what problem you're trying to solve? This sounds like something that you would simply build into the ETL process: for both dimensions, map your source data to the same target list of machine types. If a new value appears that has no mapping, raise an error (or set a default placeholder value and review the data later).

A completely different option would be a "mini-dimension" (Kimball's term), that holds all possible machine/tool combinations. If two dimensions are closely related and often used together in searches then it can be useful way to consolidate and simplify them. But even then, I assume you will be checking and cleaning the source data to conform the machine types.




回答2:


Keep in mind that a data warehouse is a de-normalized structure, so it is normal for data to repeat in dimensions. The integrity should be provided in the operational system and the ETL process. Suppose, we have something like the model below.

The business process that dispenses tools has to know which tool can be installed on which machine. Suppose a wrong tool is somehow installed on a machine. It is better to import data to match that fact and run a report that will discover a bug in the business process, than to break the ETL process because the tool and machine types do not match.

For example, a query (report) like this wold discover a mismatch and would prove quite useful.

select
      'tool-machine mismatch' as alarm
    , full_date
    , machine_name
    , machine_type
    , tool_name
    , matching_machine_type
    , employee_full_name
from fact_installed_tools as f
join dim_machine          as m on m.machine_key  = f.machine_key
join dim_tool             as t on t.tool_key     = f.installed_tool_key
join dim_date             as d on d.date_key     = f.date_key
join dim_employee         as e on e.employee_key = f.employee_key
where machine_type != matching_machine_type ;


来源:https://stackoverflow.com/questions/3906275/how-do-i-dimensionally-model-this-relationship-in-a-kimball-style-data-warehouse

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!