I\'m having trouble with calculating the median of a list of values, not the average.
I found this article Simple way to calculate median with MySQL
It has a
val
is your time column, x
and y
are two references to the data table (you can write data AS x, data AS y
).
EDIT: To avoid computing your sums twice, you can store the intermediate results.
CREATE TEMPORARY TABLE average_user_total_time
(SELECT SUM(time) AS time_taken
FROM scores
WHERE created_at >= '2010-10-10'
and created_at <= '2010-11-11'
GROUP BY user_id);
Then you can compute median over these values which are in a named table.
EDIT: Temporary table won't work here. You could try using a regular table with "MEMORY" table type. Or just have your subquery that computes the values for the median twice in your query. Apart from this, I don't see another solution. This doesn't mean there isn't a better way, maybe somebody else will come with an idea.
If you have a table R
with a column named A
, and you want the median of A, you can do as follows:
SELECT A FROM R R1
WHERE ( SELECT COUNT(A) FROM R R2 WHERE R2.A < R1.A ) = ( SELECT COUNT(A) FROM R R3 WHERE R3.A > R1.A )
Note: This will only work if there are no duplicated values in A. Also, null values are not allowed.
First try to understand what the median is: it is the middle value in the sorted list of values.
Once you understand that, the approach is two steps:
Example:
Median of 0 1 3 7 9 10: 5 (because (7+3)/2=5)
Median of 0 1 3 7 9 10 11: 7 (because 7 is the middle value)
So, to sort dates you need a numerical value; you can get their time stamp (as seconds elapsed from epoch) and use the definition of median.
Simplest ways me and my friend have found out... ENJOY!!
SELECT count(*) INTO @c from station;
select ROUND((@c+1)/2) into @final;
SELECT round(lat_n,4) from station a where @final-1=(select count(lat_n) from station b where b.lat_n > a.lat_n);
Here is a solution that is easy to understand. Just replace Your_Column and Your_Table as per your requirement.
SET @r = 0;
SELECT AVG(Your_Column)
FROM (SELECT (@r := @r + 1) AS r, Your_Column FROM Your_Table ORDER BY Your_Column) Temp
WHERE
r = (SELECT CEIL(COUNT(*) / 2) FROM Your_Table) OR
r = (SELECT FLOOR((COUNT(*) / 2) + 1) FROM Your_Table)
Originally adopted from this thread.
I propose a faster way.
Get the row count:
SELECT CEIL(COUNT(*)/2) FROM data;
Then take the middle value in a sorted subquery:
SELECT max(val) FROM (SELECT val FROM data ORDER BY val limit @middlevalue) x;
I tested this with a 5x10e6 dataset of random numbers and it will find the median in under 10 seconds.
This will find an arbitrary percentile by replacing the COUNT(*)/2
with COUNT(*)*n
where n
is the percentile (.5 for median, .75 for 75th percentile, etc).