percentiles from histogram data

后端 未结 2 1868
醉话见心
醉话见心 2020-12-07 05:36

The following table captures student grade data over a number of exams.

CREATE TABLE grades
AS
  SELECT name, exams, grade_poor, grade_fair, grade_good, grad         


        
相关标签:
2条回答
  • 2020-12-07 06:07

    First you need to unpivot this. We can do that like this...

    SELECT name,
      ARRAY[grade_poor, grade_fair, grade_good, grade_vgood]
    FROM grades
    
     name  |   array   
    -------+-----------
     arun  | {1,4,2,1}
     neha  | {3,2,1,4}
     ram   | {1,1,3,0}
     radha | {0,3,1,4}
    

    Then we need to index into grades... We do that with a CROSS JOIN LATERAL. We have 4 rows with an array of 4. We want 4*4 rows.

    SELECT name, grades, gs1.x, grades[gs1.x] AS gradeqty
    FROM (
      SELECT name,
        ARRAY[grade_poor, grade_fair, grade_good, grade_vgood]
      FROM grades
    ) AS t(name, grades)
      CROSS JOIN LATERAL generate_series(1,4) AS gs1(x)
    ORDER BY name, x;
    
    
     name  |  grades   | x |  gradeqty
    -------+-----------+---+----------
     arun  | {1,4,2,1} | 1 |        1
     arun  | {1,4,2,1} | 2 |        4
     arun  | {1,4,2,1} | 3 |        2
     arun  | {1,4,2,1} | 4 |        1
     neha  | {3,2,1,4} | 1 |        3
     neha  | {3,2,1,4} | 2 |        2
     neha  | {3,2,1,4} | 3 |        1
     neha  | {3,2,1,4} | 4 |        4
     radha | {0,3,1,4} | 1 |        0
     radha | {0,3,1,4} | 2 |        3
     radha | {0,3,1,4} | 3 |        1
     radha | {0,3,1,4} | 4 |        4
     ram   | {1,1,3,0} | 1 |        1
     ram   | {1,1,3,0} | 2 |        1
     ram   | {1,1,3,0} | 3 |        3
     ram   | {1,1,3,0} | 4 |        0
    (16 rows)
    

    Now what remains, is we need to CROSS JOIN LATERAL again to reproduce x (our grade), over gradeqty

    SELECT name,
      gs1.x
    FROM (
      SELECT name,
        ARRAY[grade_poor, grade_fair, grade_good, grade_vgood]
      FROM grades
    ) AS t(name, grades)
    CROSS JOIN LATERAL generate_series(1,4) AS gs1(x)
    CROSS JOIN LATERAL generate_series(1,grades[gs1.x]) AS gs2(x)
    ORDER BY name, gs1.x;
    
     name  | x 
    -------+---
     arun  | 1
     arun  | 2
     arun  | 2
     arun  | 2
     arun  | 2
     arun  | 3
     arun  | 3
     arun  | 4
     neha  | 1
     neha  | 1
     neha  | 1
     neha  | 2
     neha  | 2
     neha  | 3
     neha  | 4
     neha  | 4
     neha  | 4
     neha  | 4
     radha | 2
     radha | 2
     radha | 2
     radha | 3
     radha | 4
     radha | 4
     radha | 4
     radha | 4
     ram   | 1
     ram   | 2
     ram   | 3
     ram   | 3
     ram   | 3
    (31 rows)
    

    Now we GROUP BY name and then we use an Ordered-Set Aggregate Functions percent_disc to finish the job..

    SELECT name, percentile_disc(0.5) WITHIN GROUP (ORDER BY gs1.x)
    FROM (
      SELECT name,
        ARRAY[grade_poor, grade_fair, grade_good, grade_vgood]
      FROM grades
    ) AS t(name, grades)
    CROSS JOIN LATERAL generate_series(1,4) AS gs1(x)
    CROSS JOIN LATERAL generate_series(1,grades[gs1.x]) AS gs2(x)
    GROUP BY name ORDER BY name;
    
     name  | percentile_disc 
    -------+-----------------
     arun  |               2
     neha  |               2
     radha |               3
     ram   |               3
    (4 rows)
    

    Want to go into it further and make it pretty...

    SELECT name, (ARRAY['Poor', 'Fair', 'Good', 'Very Good'])[percentile_disc(0.5) WITHIN GROUP (ORDER BY gs1.x)]
    FROM (
      SELECT name,
        ARRAY[grade_poor, grade_fair, grade_good, grade_vgood]
      FROM grades
    ) AS t(name, grades)
    CROSS JOIN LATERAL generate_series(1,4) AS gs1(x)
    CROSS JOIN LATERAL generate_series(1,grades[gs1.x]) AS gs2(x)
    GROUP BY name
    ORDER BY name;
    
     name  | array 
    -------+-------
     arun  | Fair
     neha  | Fair
     radha | Good
     ram   | Good
    (4 rows)
    

    We can get a slightly more varied out put if we jack up a new user.

    INSERT INTO grades (name,grade_poor,grade_fair,grade_good,grade_vgood)
    VALUES ('Bob', 0,0,0,100);
    
     name  |   array   
    -------+-----------
     arun  | Fair
     Bob   | Very Good
     neha  | Fair
     radha | Good
     ram   | Good
    (5 rows)
    
    0 讨论(0)
  • 2020-12-07 06:12
    SELECT name, exams,
           CASE WHEN 0.5 * exams <= grade_poor
                    THEN 'grade_poor'
                WHEN 0.5 * exams <= grade_poor + grade_fair
                    THEN 'grade_fair'
                WHEN 0.5 * exams <= grade_poor + grade_fair + grade_good
                    THEN 'grade_good'
                ELSE 'grade_vgood' END AS median_grade;
    

    This rounds ties down so neha will score "grade_fair" and radha will score "grade_good". If you want to round up, change <= into <.

    0 讨论(0)
提交回复
热议问题