Optimizing Dax & model for “where date between” type queries

放肆的年华 提交于 2020-01-24 09:46:11

问题


I am building a model to allow reporting on two seperate datasets, for this example we'l say a Students dataset & a Staff dataset.

The datasets are pretty seperate and the only real link between the two is Date, so from a model perspective, there is a Students star schema & a Staff Star Schema.

The data displayed is snapshot type data, answering questions like: - For a selected date, show all active employees - for a selected date, show all enrolled students

This means that when a single date is selected, the model then finds all employees where the selected date falls within the employment start & end date , and finds all students where the selected date falls within the enrolled start & end date.

This means i had to make a decision, how to return the correct data from each schema with a single date dimension. Creating a relationship would not work as relationships in Tabular dont allow "between" type queries, so i instead have one unrelated Date Dimension and the Dax for each model finds applicable rows.

The problem is that its not the most performant. for perhaps 50k rows, adding a measure can take 5-10 seconds.

Im asking if there is a better way to either write the queries, or alter the model to still let me do "between" style queries but give better performance.

Below is an example of a dax query to return all students that were enrolled on a particular date.

Thanks for any advice.

All Enrolled Students:=IF (
HASONEVALUE ( 'Date'[Date] ),
CALCULATE (
    DISTINCTCOUNT ( 'Students'[StudentID] ),
    FILTER (
        'Students',
        'Students'[StudentStartDateID] <= MIN ( 'Date'[DateID] )
            && 'Students'[StudentEndDateID] >= MAX ( 'Date'[DateID] )
    )
),
BLANK ())

回答1:


This type of scenario is often called "Events in progress" or "Events with a duration". Take a look at the links below. The answer will depend on your version of SSAS and the event duration length.

https://www.sqlbi.com/articles/analyzing-events-with-a-duration-in-dax/ https://www.sqlbi.com/articles/understanding-dax-query-plans/ https://blog.gbrueckl.at/2014/12/events-in-progress-for-time-periods-in-dax/

If these measures don't perform well (Which can happen with events that have a long duration), it may be necessary to generate a table containing a row for each day of the event. The SQL would look something like this:

SELECT        
   d.CalendarDate      
  ,s.StudentID
FROM dbo.Students AS s 
CROSS JOIN dbo.DimDate AS d      
WHERE d.CalendarDate >= StudentStartDateID      
AND d.CalendarDate <= StudentEndDateID

Create a relationship from this table to the date/calendar table.

With this design you can use a simple DISTINCTCOUNT(Students[StudentID]) measure, which should perform better. The trade-off is that this table can become quite large. Keep it as narrow as possible for best performance and memory conservation. Another optimization could be to use a different granularity such as week or month instead of day.




回答2:


Unrelated or "disconnected" tables are good for powering slicers, timelines, and filters in certain situations. As you said in your question, you have two optimization options: Re-structure your data set or optimize the existing measure syntax.

Re-Structure Dataset

Duplicate each row for every day between start and end dates with a column for that iterated date. This can be done a handful of ways depending on how you get your dataset, but could be tedious. Then, relate your tables on this iterated date and use the relation to filter from DATE to FACT. If this is a recurring report and/or you are using SQL to pull the data, this might be worth it to make use of PowerPivot's relational calculation power.

Optimize DAX statement

If this is a one-off request or the dataset would be too tedious to duplicate out by day, then stick with the disconnected table approach and clean up the measure syntax. Since you have already included the MIN() and MAX() functions and your CALCULATE() is returning DISTINCTCOUNT(), the conditional HASONEVALUE() function is unnecessary. I ran this in a simulated environment and had good results, but that can vary with computer performance and dataset size. See below for cleaned syntax.

All Enrolled Students:=CALCULATE (
    DISTINCTCOUNT('Students'[StudentID]),
    FILTER(
        'Students',
        'Students'[StudentStartDateID]<= MIN('Date'[DateID]) &&
        'Students'[StudentEndDateID]  >= MAX('Date'[DateID])
    )
)

If your StudentID column is unique, which would make sense to me, you can further speed this up.

All Enrolled Students:=CALCULATE (
    COUNT('Students'[StudentID]),
    FILTER(
        'Students',
        'Students'[StudentStartDateID]<= MIN('Date'[DateID]) &&
        'Students'[StudentEndDateID]  >= MAX('Date'[DateID])
    )
)

If StudentID is not a number replace COUNT() with COUNTA() to get the desired effect.



来源:https://stackoverflow.com/questions/39406861/optimizing-dax-model-for-where-date-between-type-queries

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!