Conditional aggregation performance

后端 未结 2 1287
-上瘾入骨i
-上瘾入骨i 2020-12-03 01:45

Let us have the following data

 IF OBJECT_ID(\'dbo.LogTable\', \'U\') IS NOT NULL  DROP TABLE dbo.LogTable

 SELECT TOP 100000 DATEADD(day, ( ABS(CHECKSUM(NE         


        
2条回答
  •  死守一世寂寞
    2020-12-03 02:04

    Here's my example where subqueries on large tables were extremely slow (around 40-50sec) and I was given the advice to rewrite the query with FILTER (Conditional Aggregation) which sped it up to 1sec. I was amazed.

    Now I always use FILTER Conditional Aggregation because you only join on the large tables just once, and all the retrieval is done with FILTER. It's a bad idea to sub-select on large tables.

    Thread: SQL Performance Issues with Inner Selects in Postgres for tabulated report

    I needed a tabulated report, as follows,

    Example (easy flat stuff first, then the complicated tabulated stuff):

    RecallID | RecallDate | Event |..| WalkAlone | WalkWithPartner |..| ExerciseAtGym
    256      | 10-01-19   | Exrcs |..| NULL      | NULL            |..| yes
    256      | 10-01-19   | Walk  |..| yes       | NULL            |..| NULL
    256      | 10-01-19   | Eat   |..| NULL      | NULL            |..| NULL
    257      | 10-01-19   | Exrcs |..| NULL      | NULL            |..| yes
    

    My SQL had Inner Selects for the tabulated answer-based columns, and looked like this:

    select 
    -- Easy flat stuff first
    r.id as recallid, r.recall_date as recalldate, ... ,
    
    -- Example of Tabulated Columns:
    (select l.description from answers_t ans, activity_questions_t aq, lookup_t l 
    where l.id=aq.answer_choice_id and aq.question_id=13 
    and aq.id=ans.activity_question_id and aq.activity_id=27 and ans.event_id=e.id) 
         as transportationotherintensity,
    (select l.description from answers_t ans, activity_questions_t aq, lookup_t l
    where l.id=66 and l.id=aq.answer_choice_id and aq.question_id=14
    and aq.id=ans.activity_question_id and ans.event_id=e.id) 
         as commutework,
    (select l.description from answers_t ans, activity_questions_t aq, lookup_t l
    where l.id=67 and l.id=aq.answer_choice_id and aq.question_id=14 and aq.id=ans.activity_question_id and ans.event_id=e.id) 
         as commuteschool,
    (select l.description from answers_t ans, activity_questions_t aq, lookup_t l
    where l.id=95 and l.id=aq.answer_choice_id and aq.question_id=14 and aq.id=ans.activity_question_id and ans.event_id=e.id) 
         as dropoffpickup,
    

    The performance was horrible. Gordon Linoff recommended the one-time Join on the large table ANSWERS_T with FILTER as appropriate on all the tabulated Selects. That sped it up to 1 sec.

    select ans.event_id,
           max(l.description) filter (where aq.question_id = 13 and aq.activity_id = 27) as transportationotherintensity
           max(l.description) filter (where l.id = 66 and aq.question_id = 14 and aq.activity_id = 67) as commutework,
           . . .
    from activity_questions_t aq join
         lookup_t l 
         on l.id = aq.answer_choice_id join
         answers_t ans
         on aq.id = ans.activity_question_id
    group by ans.event_id
    

提交回复
热议问题