How can a multi-valued dimension be expressed in a star-schema given that it has 1-to-many relationship [Dim 1: many Fact]?

问题

I am new to Data Warehouse practices and in the context of an academic exercise I would like to create a star-schema using a dataset in a chosen area of interest. So, my classmate and I chose a dataset of car accidents in a country during a year.

The problem is that in a lot of cases if not the most, there are more than one cars involved. So if I choose to have incidents of "accidents" as the Fact Table with "Driver", "Car", "Casualties", "Location", "Contitions" etc as Dimentions, how can these be transformed in a star-schema, when dimensions "Car", "Driver" and "Casualties" are multivalued? For example I can have 3 cars involved, 3 drivers and 7 casualties. Consider that the use of star-schema is mandatory.

Also, as far as I know, a Fact Table can most often have numeric values in measurements. Can it also has catecorical variables as measurements?

回答1:

The most common approach is to use a bridge table http://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/multivalued-dimension-bridge-table/

回答2:

A very important concept in dimensional modelling is that of the grain. Ralph Kimball (whose work you will run into again and again if you're learning about dimensional modelling) emphasizes that it's really important to model from the lowest possible grain up. This lets you slice and dice your data in as many ways as possible, summing up from the lowest to any higher granularities.

Quite often when you find one of these issues where everything seems to be many-to-many, the issue is actually that you've chosen the wrong grain for the fact table in question. With apologies to Nick.McDermaid (who has suggested this granularity change in comments), "participation of an individual in an accident" is a lower granularity than "accident," so lowering the granularity of the fact table to at least that level - and creating an Accident dimension - makes a lot of sense.

It's possible that's not the lowest granularity, though; for instance, if your data set tracks injuries, each participant might have multiple injuries. So the fact table grain might be better off as "injuries sustained during an accident," in that case - you would need a row in your Injury dimension that indicated "no injury," in case, to include those participants who were not injured. So the first thing you should do isn't decide what your fact table is, it's to sift through the data and try to figure out what your lowest granularity is; once you've done that, you should have a good handle on what your fact table will be modelled around, and which dimensions you need.

Dimensional modelling can be a bit of a tough nut to crack because there are multiple ways you can do things - and the most correct way often doesn't seem very obvious, especially if you're moving from a background where you're used to more normalised data structures. I'd suggest first and foremost try to model something using the most basic table types - i.e. try to avoid things like snowflaking, bridge tables, etc. - and see if you can come up with a solution that avoids those tricks. Very often this will lead to a better model (i.e. one which is simpler to navigate, has better query performance, and can be used to answer more questions).

Nick.McDermaid's advice to experiment and try different things is also solid, as it can help you to break yourself out of your initial assumptions. There are sometimes multiple potential designs - thinking them all through thoroughly can be necessary to decide which is best.

回答3:

I've had to model this very thing at my company.

Incident & Vehicle are at their own grains. You'll need a FactIncident & a FactIncidentVehicle. This allows you to associate attributes that are related to the Incident (Date, location, type) as well as attributes for each Vehicle in the Incident.

The Incident dimension is almost a degenerate dimension, containing just a few attributes with the IncidentID such as Police Report Number.

The Incident Vehicle dimension also has just a few attributes that are specific to the vehicle for this incident only, such as was the vehicle towed.

Incident Vehicle Person is yet another grain. If your data allows for non-vehicle incidents (e.g. Trip & Fall), you'll need a 'No Vehicle' vehicle record in your dimension as well as the 'Unknown' vehicle.

A junk dimension is useful for holding the flag(Y,N,Unk) questions, such as injured , cited, owner, driver, etc.

This approach works great, and allows an incident to have 0 to many vehicles and 1 to many people as well as when the same person or vehicle can be part of more than one incident (for fleet/employee records).

来源：https://stackoverflow.com/questions/40774305/how-can-a-multi-valued-dimension-be-expressed-in-a-star-schema-given-that-it-has

标签

data-warehouse

business-intelligence

star-schema