可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I want to pass the numpy percentile() function through pandas' agg() function as I do below with various other numpy statistics functions.
Right now I have a dataframe that looks like this:
AGGREGATE MY_COLUMN A 10 A 12 B 5 B 9 A 84 B 22
And my code looks like this:
grouped = dataframe.groupby('AGGREGATE') column = grouped['MY_COLUMN'] column.agg([np.sum, np.mean, np.std, np.median, np.var, np.min, np.max])
The above code works, but I want to do something like
column.agg([np.sum, np.mean, np.percentile(50), np.percentile(95)])
i.e. specify various percentiles to return from agg()
How should this be done?
回答1:
Perhaps not super efficient, but one way would be to create a function yourself:
def percentile(n): def percentile_(x): return np.percentile(x, n) percentile_.__name__ = 'percentile_%s' % n return percentile_
Then include this in your agg
:
In [11]: column.agg([np.sum, np.mean, np.std, np.median, np.var, np.min, np.max, percentile(50), percentile(95)]) Out[11]: sum mean std median var amin amax percentile_50 percentile_95 AGGREGATE A 106 35.333333 42.158431 12 1777.333333 10 84 12 76.8 B 36 12.000000 8.888194 9 79.000000 5 22 12 76.8
Note sure this is how it should be done though...
回答2:
Being more specific, if you just want to aggregate your pandas groupby results using the percentile function, the python lambda function offers a pretty neat solution. Using the question's notation, aggregating by the percentile 95, should be:
dataframe.groupby('AGGREGATE').agg(lambda x: np.percentile(x['COL'], q = 95))
You can also assign this function to a variable and use it in conjunction with other aggregation functions.
回答3:
Try this for the 50% and 95% percentile:
column.describe( percentiles = [ 0.5, 0.95 ] )