Which method does pandas use for percentile?

牧云@^-^@ 提交于 2020-01-24 15:56:49

问题


I was trying to understand lower/upper percentiles calculation in pandas and got a bit confused. Here is the sample code and output for it.

test = pd.Series([7, 15, 36, 39, 40, 41])
test.describe()

output:

I am interested in only 25%, 75% percentiles. I wonder which method does pandas use to calculate them?

Referring to https://en.wikipedia.org/wiki/Quartile the article, results are different as following:

So what statistical/mathematical method does pandas uses to calculate percentile?


回答1:


As I mentioned in the comments, I finally figured out how it works by trying from pandas.core.algorithms import quantile using quantile function as @Abdou suggested.

I am not that good to explain it only by typing, therefore I will do it only on the given example for 25% and 75% for this example only. Here is the brief (maybe poor) explanation:

For the example list [7, 15, 36, 39, 40, 41] quantiles are following way:

7 -> 0%

15 -> 20%

36 -> 40%

39 -> 60%

40 -> 80%

41 -> 100%

Since we want to find 25% percentile, it will be between 15 and 36, moreover, it is 20% + 5% = 15 + (36-15)/4 = 15 + 5.5 = 20.5.

(36-15)/4 is used, because the distance between 15 and 36 is 40% - 20% = 20%, so we divide it by 4 to get 5%.

The same way we can find 75%.

60% + 15% = 39 + 3*(40-39)/4 = 39.75

That's it. I am really sorry for poor explanation




回答2:


It does a [series.quantile(x) for x in percentiles] where percentiles is percentiles = np.array([0.25, 0.5, 0.75]) if it s not provided.

You can see that in pandas/pandas/core/generic.py

So it is using : http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.quantile.html



来源:https://stackoverflow.com/questions/41744275/which-method-does-pandas-use-for-percentile

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!