Finding the Median value from a table, Group By Date SQLServer

喜欢而已 提交于 2021-02-17 05:22:10

问题


I have a complicated problem I am trying to solve. Please bear with me and feel free to ask any questions. I am quite new to SQL and having difficulty with this...

I need to count the median of a group of values. Now the values are not given in a table. The values are derived from a table based on hourly occurrences grouped by date.

Here's the sample table from where data is pooled.

   CREATE TABLE Table22(
   Request_Number BIGINT  NOT NULL
  ,Request_Received_Date DATETIME  NOT NULL
);
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (2016311446,'8/9/16 9:56');
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (20163612157,'9/6/16 9:17');
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (2016384250,'9/12/16 14:52');
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (20162920101,'4/19/16 8:11');
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (2016418170,'10/6/16 12:28');
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (2016392953,'9/6/16 12:39');
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (20164123416,'10/6/16 15:05');
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (2016335972,'8/9/16 7:49');
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (20162622951,'9/6/16 9:57');
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (20163913504,'9/6/16 9:47');
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (20163211326,'9/6/16 12:38');
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (20163610132,'8/30/16 16:34');
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (20164119560,'10/6/16 15:53');
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (2016334416,'8/10/16 11:06');
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (20164320028,'10/6/16 15:27');
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (20163515193,'8/24/16 19:50');
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (2016159834,'4/19/16 13:21');
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (2016178443,'4/19/16 13:05');

The Table has 2 columns: Request_Number and Request_Received_Date. Request_Number is not unique and is kind of irrelevant. I am looking for how many requests are received for a particular date and hourly within that date (24 hours). Every time there is an entry for a date, that is counted as one occurrence (TicketCount). I can use the COUNT statements to count * from Request_received_date and group by date and hour.

I did just that and created a temporary table within my script:

CREATE TABLE #z (ForDate date, OnHour int, TicketCount int)
INSERT INTO #z (ForDate, OnHour, TicketCount)           
SELECT  CAST(Request_received_date as DATE) AS 'ForDate',
                DATEPART(hh, request_received_date) AS 'OnHour', 
                COUNT(*) AS TicketCount /*Hourly Ticket Count Column*/
                FROM Table22
                GROUP BY CAST(request_received_date as DATE), DATEPART(hh, request_received_date)
                ORDER BY ForDate Desc, OnHour ASC

SELECT * FROM #z order by ForDate Desc, OnHour ASC

Now I am having the hardest time finding the median value of count per day. I have tried many different formula for median calculation and was able to make most them work. Many different examples of median calculation can be found here https://sqlperformance.com/2012/08/t-sql-queries/median

I like this piece of script to find median. The script for finding median is simple. But it finds median for all the values of Request_Received_Date. I am unable to find a way to use the group by date clause in here.

DECLARE @Median DECIMAL (12,2); 

SELECT @Median = (
    (SELECT MAX(TicketCount) FROM 
    (SELECT TOP 50 PERCENT TicketCount FROM #z ORDER BY TicketCount) AS BottomHalf)
    +
    (SELECT MIN(TicketCount) FROM 
    (SELECT TOP 50 PERCENT TicketCount FROM #z ORDER BY TicketCount DESC) AS TopHalf))/2; 

SELECT @Median  

Any help will be really appreciated.

The expected result is something like this:

ForDate   Median
10/6/2016   2
9/12/2016   1
9/6/2016    2.5
8/30/2016   1
8/24/2016   1
8/10/2016   1
8/9/2016    1
4/19/2016   1.5

回答1:


How about something like this? (Only apply if you use SQL Server 2012 or above)

SELECT DISTINCT ForDate, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY TicketCount) OVER (PARTITION BY ForDate) AS Median
FROM #z;

In short, SQL-Server has two ways to calculate median, you can read about it here: https://msdn.microsoft.com/en-us/library/hh231327.aspx

You can compare them both in this case with the code here:

SELECT DISTINCT
    ForDate
    , PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY TicketCount) OVER (PARTITION BY ForDate) AS MedianDisc
    , PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY TicketCount) OVER (PARTITION BY ForDate) AS MedianCont
FROM
    #z;


来源:https://stackoverflow.com/questions/40981077/finding-the-median-value-from-a-table-group-by-date-sqlserver

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!