Calculate Percentile Value using MySQL

荒凉一梦 提交于 2019-12-24 00:12:52

问题


I have a table which contains thousands of rows and I would like to calculate the 90th percentile for one of the fields, called 'round'.

For example, select the value of round which is at the 90th percentile.

I don't see a straightforward way to do this in MySQL.

Can somebody provide some suggestions as to how I may start this sort of calculation?

Thank you!


回答1:


First, lets assume that you have a table with a value column. You want to get the row with 95th percentile value. In other words, you are looking for a value that is bigger than 95 percent of all values.
Here is a simple answer:

SELECT * FROM 
(SELECT t.*,  @row_num :=@row_num + 1 AS row_num FROM YOUR_TABLE t, 
    (SELECT @row_num:=0) counter ORDER BY YOUR_VALUE_COLUMN) 
temp WHERE temp.row_num = ROUND (.95* @row_num); 



回答2:


I was trying to solve this for quite some time and then I found the following answer. Honestly brilliant. Also quite fast even for big tables (the table where I used it contained approx 5 mil records and needed a couple of seconds).

SELECT 
    CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(field_name ORDER BY 
    field_name SEPARATOR ','), ',', 95/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) 
    AS 95th Per 
FROM table_name;

As you can imagine just replace table_name and field_name with your table's and column's names.

For further information check Roland Bouman's original post




回答3:


http://www.artfulsoftware.com/infotree/queries.php#68

SELECT  
  a.film_id , 
  ROUND( 100.0 * ( SELECT COUNT(*) FROM film AS b WHERE b.length <= a.length ) / total.cnt, 1 )  
  AS percentile 
FROM film a  
CROSS JOIN (  
  SELECT COUNT(*) AS cnt  
  FROM film  
) AS total 
ORDER BY percentile DESC; 

This can be slow for very large tables




回答4:


As pert Tony_Pets answer, but as I noted on a similar question: I had to change the calculation slightly, for example the 90th percentile - "90/100 * COUNT(*) + 0.5" instead of "90/100 * COUNT(*) + 1". Sometimes it was skipping two values past the percentile point in the ordered list, instead of picking the next higher value for the percentile. Maybe the way integer rounding works in mysql.

ie:

.... SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(fieldValue ORDER BY fieldValue SEPARATOR ','), ',', 90/100 * COUNT(*) + 0.5), ',', -1) as 90thPercentile ....




回答5:


The SQL standard supports the PERCENTILE_DISC and PERCENTILE_CONT inverse distribution functions for precisely this job. Implementations are available in at least Oracle, PostgreSQL, SQL Server, Teradata. Unfortunately not in MySQL. But you can emulate PERCENTILE_DISC in MySQL 8 as follows:

SELECT DISTINCT first_value(my_column) OVER (
  ORDER BY CASE WHEN p <= 0.9 THEN p END DESC /* NULLS LAST */
) x,
FROM (
  SELECT
    my_column,
    percent_rank() OVER (ORDER BY my_column) p,
  FROM my_table
) t;

This calculates the PERCENT_RANK for each row given your my_column ordering, and then finds the last row for which the percent rank is less or equal to the 0.9 percentile.

This only works on MySQL 8+, which has window function support.



来源:https://stackoverflow.com/questions/19770026/calculate-percentile-value-using-mysql

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!