percentile | 易学教程

How to calculate the percentile?

阅读更多关于 How to calculate the percentile?

问题 I have access logs such as below stored in a mongodb instance: Time Service Latency [27/08/2013:11:19:22 +0000] "POST Service A HTTP/1.1" 403 [27/08/2013:11:19:24 +0000] "POST Service B HTTP/1.1" 1022 [27/08/2013:11:22:10 +0000] "POST Service A HTTP/1.1" 455 Is there an analytics function like PERCENTILE_DISC in Oracle to calculate the percentile? I would like to calculate latency percentiles over a period of time. 回答1: There still appears to be no native way to calculate percentiles but by

Panda rolling window percentile rank

阅读更多关于 Panda rolling window percentile rank

问题 I am trying to calculate the percentile rank of data by column within a rolling window. test=pd.DataFrame(np.random.randn(20,3),pd.date_range('1/1/2000',periods=20),['A','B','C']) test Out[111]: A B C 2000-01-01 -0.566992 -1.494799 0.462330 2000-01-02 -0.550769 -0.699104 0.767778 2000-01-03 -0.270597 0.060836 0.057195 2000-01-04 -0.583784 -0.546418 -0.557850 2000-01-05 0.294073 -2.326211 0.262098 2000-01-06 -1.122543 -0.116279 -0.003088 2000-01-07 0.121387 0.763100 3.503757 2000-01-08 0

Which method does pandas use for percentile?

阅读更多关于 Which method does pandas use for percentile?

问题 I was trying to understand lower/upper percentiles calculation in pandas and got a bit confused. Here is the sample code and output for it. test = pd.Series([7, 15, 36, 39, 40, 41]) test.describe() output: I am interested in only 25%, 75% percentiles. I wonder which method does pandas use to calculate them? Referring to https://en.wikipedia.org/wiki/Quartile the article, results are different as following: So what statistical/mathematical method does pandas uses to calculate percentile? 回答1:

NumPy统计函数

阅读更多关于 NumPy统计函数

numpy.amin()和numpy.amax() numpy.amin()用于计算数组中元素沿着指定轴的最小值。 numpy.amax()用于计算数组中元素沿着指定轴的最大值 a=np.array([1,3,6],[3,4,11],[6,1,4]) print(np.amin(a,1) #每行最小值 print(np.amin(a,0) #每列最小值 print(np.amax(a) #所有元素中最大值 print(np.amax(a,1)) #j每行的最大值结果： [1 3 1] [1 1 4] 11 [ 6 11 6] ** numpy.ptp() 用来计算数组中元素的最大值与最小值的差（最大值-最小值）。 numpy.percentile()** 表示百分比 numpy.percentile(a,q,axis) a:输入数组 q:要计算的百分位数 axis:沿着它计算百分位数的轴对于一个数组，我们设置它的百分位数为20，则我们可以推算出在该数组排序中在百分之20上的值是多少，例如： # percentail百分数 a = np.array([[21, 60, 4], [10, 20, 1]]) print('数组a：') print(a) print('调用 percentile() 函数：') # 50% 的分位数，就是 a 里排序之后的中位数 print(np

Numpy基础入门（6）统计函数和排序

阅读更多关于 Numpy基础入门（6）统计函数和排序

统计函数 1.numpy.amin() 和 numpy.amax()，这些函数从给定数组中的元素沿指定轴返回最小值和最大值。 >>>import numpy as np >>>a = np.array([[3,7,5],[8,4,3],[2,4,9]]) >>>a array([[3, 7, 5], [8, 4, 3], [2, 4, 9]]) >>>np.amin(a,1) array([3, 3, 2]) >>>np.amin(a,0) array([2, 4, 3]) >>>np.amax(a) 9 >>>np.amax(a, axis = 0) array([8, 7, 9]) 2.numpy.ptp()函数返回沿轴的值的范围(最大值 - 最小值)。 >>>import numpy as np >>>a = np.array([[3,7,5],[8,4,3],[2,4,9]]) >>>a array([[3,7,5],[8,4,3],[2,4,9]]) >>>np.ptp(a) 7 >>>np.ptp(a, axis = 1) array([4, 5, 7]) >>>np.ptp(a, axis = 0) array([6, 3, 6]) 3.numpy.percentile()百分位数是统计中使用的度量，表示小于这个值得观察值占某个百分比。函数numpy

Python Pandas - how is 25 percentile calculated by describe function

阅读更多关于 Python Pandas - how is 25 percentile calculated by describe function

问题 For a given dataset in a data frame, when I apply the describe function, I get the basic stats which include min, max, 25%, 50% etc. For example: data_1 = pd.DataFrame({'One':[4,6,8,10]},columns=['One']) data_1.describe() The output is: One count 4.000000 mean 7.000000 std 2.581989 min 4.000000 25% 5.500000 50% 7.000000 75% 8.500000 max 10.000000 My question is : What is the mathematical formula to calculate the 25%? 1) Based on what I know, it is: formula = percentile * n (n is number of

Python Pandas - how is 25 percentile calculated by describe function

阅读更多关于 Python Pandas - how is 25 percentile calculated by describe function

Categorize dataframe by percentile in R

阅读更多关于 Categorize dataframe by percentile in R

问题 I have following data: set.seed(15) ddf <- data.frame( gp1 = sample(1:3, 200, replace=T), gp2 = sample(c('a','b'), 200, replace=T), param = sample(10:20, 200, replace=T) ) head(ddf) gp1 gp2 param 1 2 a 18 2 1 b 11 3 3 a 15 4 2 b 20 5 2 a 17 6 3 b 11 I have to create another column called 'category' which needs to have a value of 1 if 'param' for that row is more than 75th percentile for that gp1 and gp2. I tried following but I am not sure if this is correct: ddf$category = with(ddf, ifelse

within group sorts in mysql

阅读更多关于 within group sorts in mysql

问题 I have a panel data set: that is, times , ids , and values . I would like to do a ranking based on value for each date. I can achieve the sort very simply by running: select * from tbl order by date, value The issue I have is once the table is sorted in this way, how do I retrieve the row number of each group (that is, for each date I would like there to be a column called ranking that goes from 1 to N). Example: Input: Date, ID, Value d1, id1, 2 d1, id2, 1 d2, id1, 10 d2, id2, 11 Output:

Select top 10 percent, also bottom percent in SQL Server

阅读更多关于 Select top 10 percent, also bottom percent in SQL Server

问题 I have two questions: When using the select top 10 percent statement, for example on a test database with 100 scores, like this: Select top 10 percent score from test Would SQL Server return the 10 highest scores, or just the top 10 obs based on how the data look like now (e.g. if the data is entered into database in a way that lowest score appears first, then would this return the lowest 10 scores)? I want to be able to get the top 10 highest scores and bottom 10 lowest scores out of this