Is there a better way to calculate the median (not average)

前端 未结 7 923
我在风中等你
我在风中等你 2021-02-02 15:38

Suppose I have the following table definition:

CREATE TABLE x (i serial primary key, value integer not null);

I want to calculate the MEDIAN o

7条回答
  •  Happy的楠姐
    2021-02-02 16:20

    A simpler query for that:

    WITH y AS (
       SELECT value, row_number() OVER (ORDER BY value) AS rn
       FROM   x
       WHERE  value IS NOT NULL
       )
    , c AS (SELECT count(*) AS ct FROM y) 
    SELECT CASE WHEN c.ct%2 = 0 THEN
              round((SELECT avg(value) FROM y WHERE y.rn IN (c.ct/2, c.ct/2+1)), 3)
           ELSE
                    (SELECT     value  FROM y WHERE y.rn = (c.ct+1)/2)
           END AS median
    FROM   c;
    

    Major points

    • Ignores NULL values.
    • Core feature is the row_number() window function, which has been there since version 8.4
    • The final SELECT gets one row for uneven numbers and avg() of two rows for even numbers. Result is numeric, rounded to 3 decimal places.

    Test shows, that the new version is 4x faster than (and yields correct results, unlike) the query in the question:

    CREATE TEMP TABLE x (value int);
    INSERT INTO x SELECT generate_series(1,10000);
    INSERT INTO x VALUES (NULL),(NULL),(NULL),(3);
    

提交回复
热议问题