Window functions to count distinct records

前端 未结 3 486
没有蜡笔的小新
没有蜡笔的小新 2020-12-23 15:08

The query below is based on a complicated view and the view works as I want it to (I\'m not going to include the view because I don\'t think it will help with the question a

相关标签:
3条回答
  • 2020-12-23 15:49

    Doing a count(distinct) as a windows function requires a trick. Several levels of tricks, actually.

    Because your request is actually truly simple -- the value is always 1 because rx.drugClass is in the partitioning clause -- I will make an assumption. Let's say you want to count the number of unique drug classes per patid.

    If so, do a row_number() partitioned by patid and drugClass. When this is 1, within a patid, , then a new drugClass is starting. Create a flag that is 1 in this case and 0 in all other cases.

    Then, you can simply do a sum with a partitioning clause to get the number of distinct values.

    The query (after formatting it so I can read it), looks like:

    select rx.patid, d2.fillDate, d2.scriptEndDate, rx.drugName, rx.drugClass,
           SUM(IsFirstRowInGroup) over (partition by rx.patid) as NumDrugCount
    from (select distinct rx.patid, d2.fillDate, d2.scriptEndDate, rx.drugName, rx.drugClass,
                 (case when 1 = ROW_NUMBER() over (partition by rx.drugClass, rx.patid order by (select NULL))
                       then 1 else 0
                  end) as IsFirstRowInGroup
          from (select ROW_NUMBER() over(partition by d.patid order by d.patid,d.uniquedrugsintimeframe desc) as rn, 
                       d.patid, d.fillDate, d.scriptEndDate, d.uniqueDrugsInTimeFrame
                from DrugsPerTimeFrame as d
               ) d2 inner join
               rx
               on rx.patid = d2.patid inner join
               DrugTable dt
               on dt.drugClass = rx.drugClass
          where d2.rn=1 and rx.fillDate between d2.fillDate and d2.scriptEndDate and
                dt.drugClass in ('h3a','h6h','h4b','h2f','h2s','j7c','h2e')
         ) t
    order by patid
    
    0 讨论(0)
  • 2020-12-23 15:53

    I came across this question in search for a solution to my problem of counting distinct values. In searching for an answer I came across this post. See last comment. I've tested it and used the SQL. It works really well for me and I figured that I would provide another solution here.

    In summary, using DENSE_RANK(), with PARTITION BY the grouped columns, and ORDER BY both ASC and DESC on the columns to count:

    DENSE_RANK() OVER (PARTITION BY drugClass ORDER BY drugName ASC) +
    DENSE_RANK() OVER (PARTITION BY drugClass ORDER BY drugName DESC) - 1 AS drugCountsInFamilies
    

    I use this as a template for myself.

    DENSE_RANK() OVER (PARTITION BY PartitionByFields ORDER BY OrderByFields ASC ) +
    DENSE_RANK() OVER (PARTITION BY PartitionByFields ORDER BY OrderByFields DESC) - 1 AS DistinctCount
    

    I hope this helps!

    0 讨论(0)
  • 2020-12-23 15:58

    Why would something like this not work?

    SELECT 
       IDCol_1
      ,IDCol_2
      ,Count(*) Over(Partition By IDCol_1, IDCol_2 order by IDCol_1) as numDistinct
    FROM Table_1
    
    0 讨论(0)
提交回复
热议问题