Python equivalent of Excel's PERCENTILE.EXC

后端未结

关注

 2  2034

I am using Pandas to compute some financial risk analytics, including Value at Risk. In short, to compute Value at Risk (VaR), you take a time series of simulated portfolio

相关标签:

2条回答

长情又很酷

2020-12-19 16:13

It won't be as efficient as Pandas' own percentile but it should work:

def quantile_exc(ser, q):
    ser_sorted = ser.sort_values()
    rank = q * (len(ser) + 1) - 1
    assert rank > 0, 'quantile is too small'
    rank_l = int(rank)
    return ser_sorted.iat[rank_l] + (ser_sorted.iat[rank_l + 1] - 
                                     ser_sorted.iat[rank_l]) * (rank - rank_l)

ser = pd.Series([15,14,18,-2,6,-78,31,21,98,-54,-2,-36,5,2,46,-72,3,-2,7,9,34])

quantile_exc(ser, 0.05)
Out: -77.400000000000006

quantile_exc(ser, 0.1)
Out: -68.399999999999991

quantile_exc(ser, 0.3)
Out: -2.0

Note that Excel fails for small percentiles; it is not a bug. It is because ranks that go below the minimum value is not suitable for interpolation. So you might want to check if rank > 0 in the quantile_exc function (see the assertion part).

0 讨论(0)

一个人的身影

2020-12-19 16:20
EDIT: I just saw your edit. I think you are making a mistake. The value -77.4 is actually the 99.5% percentile of your data. Try test_pd.quantile(.005). I believe that you must have made a mistake in Excel when specifying your percentile.

EDIT 2: I just tested it myself in Excel. For the 50-th percentile, I am getting the correct value in both Excel and Numpy/Pandas. For the 5th percentile however, I am getting -72 in Pandas/Numpy, and -74.6 in Excel. But Excel is just wrong here: it is very obvious that -74.6 is the 0.5th percentile, not the 5th...

FINAL EDIT: After some testing, it seems like Excel is behaving erratically around very small values of k with the PERCENTILE.EXC() function. Indeed, using the function with any k < 0.05 returns an error, so 0.05 must be a threshold below which the function is not working properly. I do not know why Excel chooses to return the 0.5th percentile when asked to exclude the 5th percentile (the logical behavior would be to return the 4.9th percentile, or the 4.99th...). However, both Numpy, Pandas and Excel return the same values for other values of k. For instance, PERCENTILE.EXC(0.5) = 6, and test_pd.quantile(0.5) = 6 as well. I guess the lesson is that we need to be wary of Excel's behavior ;).

The way I understand your problem is: you want to know the value that corresponds to the k-th percentile of your data, this k-th percentile excluded. However, pd.quantile() returns the value that corresponds to your k-th percentile, this k-th percentile included.

I do not think that pd.quantile() returning the k-th percentile included is an issue. Indeed, assuming you want all stocks having a Value at Risk strictly above the 5-th percentile, you would do:
```
mask = data["VaR"] < pd.quantile(data["VaR"], 0.05)
data_filt = data[mask]
```
Because you used a "smaller than" ( < ) operator, the values which exactly correspond to your 5-th percentile will be excluded, similar to Excel's PERCENTILE.EXC() function.

Do tell me if this is what you were looking for.
0 讨论(0)
发布评论:

提交评论
- 加载中...