Quartiles in SQL query

前端 未结 6 2074
面向向阳花
面向向阳花 2020-12-11 07:44

I have a very simple table like that:

CREATE TABLE IF NOT EXISTS LuxLog (
  Sensor TINYINT,
  Lux INT,
  PRIMARY KEY(Sensor)
)

It contains

相关标签:
6条回答
  • 2020-12-11 07:57

    See SqlFiddle : http://sqlfiddle.com/#!9/accca6/2/6 Note : for the sqlfiddle I've generated 100 rows, each integer between 1 and 100 has a row, but it is a random order (done in excel).

    Here is the code :

    SET @number_of_rows := (SELECT COUNT(*) FROM LuxLog);
    SET @quartile := (ROUND(@number_of_rows*0.25));
    SET @sql_q1 := (CONCAT('(SELECT "Q1" AS quartile_name , Lux, Sensor FROM LuxLog ORDER BY Lux DESC LIMIT 1 OFFSET ', @quartile,')'));
    SET @sql_q3 := (CONCAT('( SELECT "Q3" AS quartile_name , Lux, Sensor FROM LuxLog ORDER BY Lux ASC LIMIT 1 OFFSET ', @quartile,');'));
    SET @sql := (CONCAT(@sql_q1,' UNION ',@sql_q3));
    PREPARE stmt1 FROM @sql;
    EXECUTE stmt1;
    

    EDIT :

    SET @current_sensor := 101;
    SET @quartile := (ROUND((SELECT COUNT(*) FROM LuxLog WHERE Sensor = @current_sensor)*0.25));
    SET @sql_q1 := (CONCAT('(SELECT "Q1" AS quartile_name , Lux, Sensor FROM LuxLog WHERE Sensor=', @current_sensor,' ORDER BY Lux DESC LIMIT 1 OFFSET ', @quartile,')'));
    SET @sql_q3 := (CONCAT('( SELECT "Q3" AS quartile_name , Lux, Sensor FROM LuxLog WHERE Sensor=', @current_sensor,' ORDER BY Lux ASC LIMIT 1 OFFSET ', @quartile,');'));
    SET @sql := (CONCAT(@sql_q1,' UNION ',@sql_q3));
    PREPARE stmt1 FROM @sql;
    EXECUTE stmt1;
    

    Underlying reasoning is as follows : For quartile 1 we want to get 25% from the top so we want to know how much rows there are, that's :

    SET @number_of_rows := (SELECT COUNT(*) FROM LuxLog);
    

    Now that we know the number of rows, we want to know what is 25% of that, it is this line :

    SET @quartile := (ROUND(@number_of_rows*0.25));
    

    Then to find a quartile we want to order the LuxLog table by Lux, then to get the row number "@quartile", in order to do that we set the OFFSET to @quartile to say that we want to start our select from the row number @quartile and we say limit 1 to say that we want to retrieve only one row. That's :

    SET @sql_q1 := (CONCAT('(SELECT "Q1" AS quartile_name , Lux, Sensor FROM LuxLog ORDER BY Lux DESC LIMIT 1 OFFSET ', @quartile,')'));
    

    We do (almost) the same for the other quartile, but rather than starting from the top (from higher values to lower) we start from the bottom (it explains the ASC).

    But for now we just have strings stored in the variables @sql_q1 and @sql_q3, so the concatenate them, we union the results of the queries, we prepare the query and execute it.

    0 讨论(0)
  • 2020-12-11 07:58

    Here's a query I came up with for calculating quartiles; it runs in ~0.04s w/ ~5000 table rows. I included the min/max values as I am ultimately using this data to build the four quartile ranges:

       SELECT percentile_table.percentile, avg(ColumnName) AS percentile_values
        FROM   
            (SELECT @rownum := @rownum + 1 AS `row_number`, 
                       d.ColumnName 
                FROM   PercentileTestTable d, 
                       (SELECT @rownum := 0) r 
                WHERE  ColumnName IS NOT NULL 
                ORDER  BY d.ColumnName
            ) AS t1, 
            (SELECT count(*) AS total_rows 
                FROM   PercentileTestTable d 
                WHERE  ColumnName IS NOT NULL 
            ) AS t2, 
            (SELECT 0 AS percentile 
                UNION ALL 
                SELECT 0.25
                UNION ALL 
                SELECT 0.5
                UNION ALL 
                SELECT 0.75
                UNION ALL 
                SELECT 1
            ) AS percentile_table  
        WHERE  
            (percentile_table.percentile != 0 
                AND percentile_table.percentile != 1 
                AND t1.row_number IN 
                ( 
                    floor(( total_rows + 1 ) * percentile_table.percentile), 
                    floor(( total_rows + 2 ) * percentile_table.percentile)
                ) 
            ) OR (
                percentile_table.percentile = 0 
                AND t1.row_number = 1
            ) OR (
                percentile_table.percentile = 1 
                AND t1.row_number = total_rows
            )
        GROUP BY percentile_table.percentile; 
    

    Fiddle here: http://sqlfiddle.com/#!9/58c0e2/1

    There are certainly performance issues; I'd love if anyone has feedback on how to improve this.

    Sample data list:

     3, 4, 4, 4, 7, 10, 11, 12, 14, 16, 17, 18
    

    Sample query output:

    | percentile | percentile_values |
    |------------|-------------------|
    |          0 |                 3 |
    |       0.25 |                 4 |
    |        0.5 |              10.5 |
    |       0.75 |                15 |
    |          1 |                18 |
    
    0 讨论(0)
  • 2020-12-11 08:07

    Something like this should do it:

    select
        ll.*,
        if (a.position is not null, 1,
            if (b.position is not null, 2, 
            if (c.position is not null, 3, 
            if (d.position is not null, 4, 0)))
        ) as quartile
    from
        luxlog ll
        left outer join luxlog a on ll.position = a.position and a.lux > (select count(*)*0.00 from luxlog) and a.lux <= (select count(*)*0.25 from luxlog)
        left outer join luxlog b on ll.position = b.position and b.lux > (select count(*)*0.25 from luxlog) and b.lux <= (select count(*)*0.50 from luxlog)
        left outer join luxlog c on ll.position = c.position and c.lux > (select count(*)*0.50 from luxlog) and c.lux <= (select count(*)*0.75 from luxlog)
        left outer join luxlog d on ll.position = d.position and d.lux > (select count(*)*0.75 from luxlog)
    ;    
    

    Here's the complete example:

    use example;
    
    drop table if exists luxlog;
    
    CREATE TABLE LuxLog (
      Sensor TINYINT,
      Lux INT,
      position int,
      PRIMARY KEY(Position)
    );
    
    insert into luxlog values (0, 1, 10);
    insert into luxlog values (0, 2, 20);
    insert into luxlog values (0, 3, 30);
    insert into luxlog values (0, 4, 40);
    insert into luxlog values (0, 5, 50);
    insert into luxlog values (0, 6, 60);
    insert into luxlog values (0, 7, 70);
    insert into luxlog values (0, 8, 80);
    
    select count(*)*.25 from luxlog;
    select count(*)*.50 from luxlog;
    
    select
        ll.*,
        a.position,
        b.position,
        if(
            a.position is not null, 1,
            if (b.position is not null, 2, 0)
        ) as quartile
    from
        luxlog ll
        left outer join luxlog a on ll.position = a.position and a.lux >= (select count(*)*0.00 from luxlog) and a.lux < (select count(*)*0.25 from luxlog)
        left outer join luxlog b on ll.position = b.position and b.lux >= (select count(*)*0.25 from luxlog) and b.lux < (select count(*)*0.50 from luxlog)
        left outer join luxlog c on ll.position = c.position and c.lux >= (select count(*)*0.50 from luxlog) and c.lux < (select count(*)*0.75 from luxlog)
        left outer join luxlog d on ll.position = d.position and d.lux >= (select count(*)*0.75 from luxlog) and d.lux < (select count(*)*1.00 from luxlog)
    ;    
    
    
    select
        ll.*,
        if (a.position is not null, 1,
            if (b.position is not null, 2, 
            if (c.position is not null, 3, 
            if (d.position is not null, 4, 0)))
        ) as quartile
    from
        luxlog ll
        left outer join luxlog a on ll.position = a.position and a.lux > (select count(*)*0.00 from luxlog) and a.lux <= (select count(*)*0.25 from luxlog)
        left outer join luxlog b on ll.position = b.position and b.lux > (select count(*)*0.25 from luxlog) and b.lux <= (select count(*)*0.50 from luxlog)
        left outer join luxlog c on ll.position = c.position and c.lux > (select count(*)*0.50 from luxlog) and c.lux <= (select count(*)*0.75 from luxlog)
        left outer join luxlog d on ll.position = d.position and d.lux > (select count(*)*0.75 from luxlog)
    ;    
    
    0 讨论(0)
  • 2020-12-11 08:09

    Well to use NTILE is very simple but it is a Postgres Function. You basically just do something like this:

    SELECT value_you_are_NTILING,
        NTILE(4) OVER (ORDER BY value_you_are_NTILING DESC) AS tiles
    FROM
    (SELECT math_that_gives_you_the_value_you_are_NTILING_here AS value_you_are_NTILING FROM tablename);
    

    Here is a simple example I made for you on SQLFiddle: http://sqlfiddle.com/#!15/7f05a/1

    In MySQL you would use RANK... Here is the SQLFiddle for that: http://www.sqlfiddle.com/#!2/d5587/1 (this comes from the Question linked below)

    This use of MySQL RANK() comes from the Stackoverflow answered here: Rank function in MySQL

    Look for the answer by Salman A.

    0 讨论(0)
  • 2020-12-11 08:12

    I use this solution with a MYSQL function :

    x is the centile you want

    array_values your group_concat values order and separated by ,

    DROP FUNCTION IF EXISTS centile;
    
    delimiter $$
    CREATE FUNCTION `centile`(x Text, array_values TEXT) RETURNS text
    BEGIN
    
    Declare DIFF_RANK TEXT;
    Declare RANG_FLOOR INT;
    Declare COUNT INT;
    Declare VALEUR_SUP TEXT;
    Declare VALEUR_INF TEXT;
    
    SET COUNT = LENGTH(array_values) - LENGTH(REPLACE(array_values, ',', '')) + 1;
    SET RANG_FLOOR = FLOOR(ROUND((x) * (COUNT-1),2));
    SET DIFF_RANK = ((x) * (COUNT-1)) - FLOOR(ROUND((x) * (COUNT-1),2));
    
    SET VALEUR_SUP = CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(array_values,',', RANG_FLOOR+2),',',-1) AS DECIMAL);
    SET VALEUR_INF = CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(array_values,',', RANG_FLOOR+1),',',-1) AS DECIMAL);
    
    /****
        https://fr.wikipedia.org/wiki/Quantile
        x_j+1 + g (x_j+2 - x_j+1)       
    ***/
    RETURN  Round((VALEUR_INF + (DIFF_RANK* (VALEUR_SUP-VALEUR_INF) ) ),2);
    
    END$$
    

    Example :

    Select centile(3/4,GROUP_CONCAT(lux ORDER BY lux SEPARATOR ',')) as quartile_3
    FROM LuxLog
    WHERE Sensor=12 AND Lux<>0
    
    0 讨论(0)
  • 2020-12-11 08:17

    Or you could use rank like this:

    select
        ll.*,
        @curRank := @curRank + 1 as rank,
        if (@curRank <= (select count(*)*0.25 from luxlog), 1,
            if (@curRank <= (select count(*)*0.50 from luxlog), 2, 
            if (@curRank <= (select count(*)*0.75 from luxlog), 3, 4))
        ) as quartile
    from
        luxlog ll,
        (SELECT @curRank := 0) r
    ;    
    

    And this will give just one record for each quartile:

    select
        x.quartile, group_concat(position)
    from (
        select
            ll.*,
            @curRank := @curRank + 1 as rank,
            if (@curRank > 0 and @curRank <= (select count(*)*0.25 from luxlog), 1,
                if (@curRank > 0 and @curRank <= (select count(*)*0.50 from luxlog), 2, 
                if (@curRank > 0 and @curRank <= (select count(*)*0.75 from luxlog), 3, 4))
            ) as quartile
        from
            luxlog ll,
            (SELECT @curRank := 0) r
    ) x
    group by quartile
    
    + ------------- + --------------------------- +
    | quartile      | group_concat(position)      |
    + ------------- + --------------------------- +
    | 1             | 10,20                       |
    | 2             | 30,40                       |
    | 3             | 50,60                       |
    | 4             | 70,80                       |
    + ------------- + --------------------------- +
    4 rows
    

    EDIT: The sqlFiddle example (http://sqlfiddle.com/#!9/a14a4/17) looks like this after this is removed

    /*SET @number_of_rows := (SELECT COUNT(*) FROM LuxLog);
    SET @quartile := (ROUND(@number_of_rows*0.25));
    SET @sql_q1 := (CONCAT('(SELECT "Q1" AS quartile_name , Lux, Sensor FROM LuxLog WHERE Sensor=101 ORDER BY Lux DESC LIMIT 1 OFFSET ', @quartile,')'));
    SET @sql_q3 := (CONCAT('( SELECT "Q3" AS quartile_name , Lux, Sensor FROM LuxLog WHERE Sensor=101 ORDER BY Lux ASC LIMIT 1 OFFSET ', @quartile,');'));
    SET @sql := (CONCAT(@sql_q1,' UNION ',@sql_q3));
    PREPARE stmt1 FROM @sql;
    EXECUTE stmt1;*/
    

    enter image description here

    0 讨论(0)
提交回复
热议问题