Oracle: how to “group by” over a range?

后端 未结 10 1556
隐瞒了意图╮
隐瞒了意图╮ 2020-12-07 22:26

If I have a table like this:

pkey   age
----   ---
   1     8
   2     5
   3    12
   4    12
   5    22

I can \"group by\" to get a count

10条回答
  •  一整个雨季
    2020-12-07 23:18

    What you are looking for, is basically the data for a histogram.

    You would have the age (or age-range) on the x-axis and the count n (or frequency) on the y-axis.

    In the simplest form, one could simply count the number of each distinct age value like you already described:

    SELECT age, count(*)
    FROM tbl
    GROUP BY age
    

    When there are too many different values for the x-axis however, one may want to create groups (or clusters or buckets). In your case, you group by a constant range of 10.

    We can avoid writing a WHEN ... THEN line for each range - there could be hundreds if it were not about age. Instead, the approach by @MatthewFlaschen is preferable for the reasons mentioned by @NitinMidha.

    Now let's build the SQL...

    First, we need to split the ages into range-groups of 10 like so:

    • 0-9
    • 10-19
    • 20 - 29
    • etc.

    This can be achieved by dividing the age column by 10 and then calculating the result's FLOOR:

    FLOOR(age/10)
    

    "FLOOR returns the largest integer equal to or less than n" http://docs.oracle.com/cd/E11882_01/server.112/e26088/functions067.htm#SQLRF00643

    Then we take the original SQL and replace age with that expression:

    SELECT FLOOR(age/10), count(*)
    FROM tbl
    GROUP BY FLOOR(age/10)
    

    This is OK, but we cannot see the range, yet. Instead we only see the calculated floor values which are 0, 1, 2 ... n.

    To get the actual lower bound, we need to multiply it with 10 again so we get 0, 10, 20 ... n:

    FLOOR(age/10) * 10
    

    We also need the upper bound of each range which is lower bound + 10 - 1 or

    FLOOR(age/10) * 10 + 10 - 1
    

    Finally, we concatenate both into a string like this:

    TO_CHAR(FLOOR(age/10) * 10) || '-' || TO_CHAR(FLOOR(age/10) * 10 + 10 - 1)
    

    This creates '0-9', '10-19', '20-29' etc.

    Now our SQL looks like this:

    SELECT 
    TO_CHAR(FLOOR(age/10) * 10) || ' - ' || TO_CHAR(FLOOR(age/10) * 10 + 10 - 1),
    COUNT(*)
    FROM tbl
    GROUP BY FLOOR(age/10)
    

    Finally, apply an order and nice column aliases:

    SELECT 
    TO_CHAR(FLOOR(age/10) * 10) || ' - ' || TO_CHAR(FLOOR(age/10) * 10 + 10 - 1) AS range,
    COUNT(*) AS frequency
    FROM tbl
    GROUP BY FLOOR(age/10)
    ORDER BY FLOOR(age/10)
    

    However, in more complex scenarios, these ranges might not be grouped into constant chunks of size 10, but need dynamical clustering. Oracle has more advanced histogram functions included, see http://docs.oracle.com/cd/E16655_01/server.121/e15858/tgsql_histo.htm#TGSQL366

    Credits to @MatthewFlaschen for his approach; I only explained the details.

提交回复
热议问题