Calculating percentile rankings in MS SQL

前端 未结 8 1627
清酒与你
清酒与你 2020-12-14 02:22

What\'s the best way to calculate percentile rankings (e.g. the 90th percentile or the median score) in MSSQL 2005?

I\'d like to be able to select the 25th, median,

相关标签:
8条回答
  • 2020-12-14 02:45

    I've been working on this a little more, and here's what I've come up with so far:

    CREATE PROCEDURE [dbo].[TestGetPercentile]
    
    @percentile as float,
    @resultval as float output
    
    AS
    
    BEGIN
    
    WITH scores(score, prev_rank, curr_rank, next_rank) AS (
        SELECT dblScore,
            (ROW_NUMBER() OVER ( ORDER BY dblScore ) - 1.0) / ((SELECT COUNT(*) FROM TestScores) + 1)  [prev_rank],
            (ROW_NUMBER() OVER ( ORDER BY dblScore ) + 0.0) / ((SELECT COUNT(*) FROM TestScores) + 1)  [curr_rank],
            (ROW_NUMBER() OVER ( ORDER BY dblScore ) + 1.0) / ((SELECT COUNT(*) FROM TestScores) + 1)  [next_rank]
        FROM TestScores
    )
    
    SELECT @resultval = (
        SELECT TOP 1 
        CASE WHEN t1.score = t2.score
            THEN t1.score
        ELSE
            t1.score + (t2.score - t1.score) * ((@percentile - t1.curr_rank) / (t2.curr_rank - t1.curr_rank))
        END
        FROM scores t1, scores t2
        WHERE (t1.curr_rank = @percentile OR (t1.curr_rank < @percentile AND t1.next_rank > @percentile))
            AND (t2.curr_rank = @percentile OR (t2.curr_rank > @percentile AND t2.prev_rank < @percentile))
    )
    
    END
    

    Then in another stored procedure I do this:

    DECLARE @pct25 float;
    DECLARE @pct50 float;
    DECLARE @pct75 float;
    
    exec SurveyGetPercentile .25, @pct25 output
    exec SurveyGetPercentile .50, @pct50 output
    exec SurveyGetPercentile .75, @pct75 output
    
    Select
        min(dblScore) as minScore,
        max(dblScore) as maxScore,
        avg(dblScore) as avgScore,
        @pct25 as percentile25,
        @pct50 as percentile50,
        @pct75 as percentile75
    From TestScores
    

    It still doesn't do quite what I'm looking for. This will get the stats for all tests; whereas I would like to be able to select from a TestScores table that has multiple different tests in it and get back the same stats for each different test (like I have in my example table in my question).

    0 讨论(0)
  • 2020-12-14 02:51

    The 50th percentile is same as the median. When computing other percentile, say the 80th, sort the data for the 80 percent of data in ascending order and the other 20 percent in descending order, and take the avg of the two middle value.

    NB: The median query has been around for a long time, but cannot remember where exactly I got it from, I have only amended it to compute other percentiles.

    DECLARE @Temp TABLE(Id INT IDENTITY(1,1), DATA DECIMAL(10,5))
    
    INSERT INTO @Temp VALUES(0)
    INSERT INTO @Temp VALUES(2)
    INSERT INTO @Temp VALUES(8)
    INSERT INTO @Temp VALUES(4)
    INSERT INTO @Temp VALUES(3)
    INSERT INTO @Temp VALUES(6)
    INSERT INTO @Temp VALUES(6)
    INSERT INTO @Temp VALUES(6) 
    INSERT INTO @Temp VALUES(7)
    INSERT INTO @Temp VALUES(0)
    INSERT INTO @Temp VALUES(1)
    INSERT INTO @Temp VALUES(NULL)
    
    
    --50th percentile or median
    SELECT ((
            SELECT TOP 1 DATA
            FROM   (
                    SELECT  TOP 50 PERCENT DATA
                    FROM    @Temp
                    WHERE   DATA IS NOT NULL
                    ORDER BY DATA
                    ) AS A
            ORDER BY DATA DESC) + 
            (
            SELECT TOP 1 DATA
            FROM   (
                    SELECT  TOP 50 PERCENT DATA
                    FROM    @Temp
                    WHERE   DATA IS NOT NULL
                    ORDER BY DATA DESC
                    ) AS A
            ORDER BY DATA ASC)) / 2.0
    
    
    --90th percentile 
    SELECT ((
            SELECT TOP 1 DATA
            FROM   (
                    SELECT  TOP 90 PERCENT DATA
                    FROM    @Temp
                    WHERE   DATA IS NOT NULL
                    ORDER BY DATA
                    ) AS A
            ORDER BY DATA DESC) + 
            (
            SELECT TOP 1 DATA
            FROM   (
                    SELECT  TOP 10 PERCENT DATA
                    FROM    @Temp
                    WHERE   DATA IS NOT NULL
                    ORDER BY DATA DESC
                    ) AS A
            ORDER BY DATA ASC)) / 2.0
    
    
    --75th percentile
    SELECT ((
            SELECT TOP 1 DATA
            FROM   (
                    SELECT  TOP 75 PERCENT DATA
                    FROM    @Temp
                    WHERE   DATA IS NOT NULL
                    ORDER BY DATA
                    ) AS A
            ORDER BY DATA DESC) + 
            (
            SELECT TOP 1 DATA
            FROM   (
                    SELECT  TOP 25 PERCENT DATA
                    FROM    @Temp
                    WHERE   DATA IS NOT NULL
                    ORDER BY DATA DESC
                    ) AS A
            ORDER BY DATA ASC)) / 2.0
    
    0 讨论(0)
提交回复
热议问题