Calculate percentile for every value in a column of dataframe

孤街醉人 提交于 2019-11-28 05:29:25

问题


I am trying to calculate percentile for every value in column a from a DataFrame x.

Is there a better way to write the following piece of code?

x["pcta"] = [stats.percentileofscore(x["a"].values, i) 
                                    for i in x["a"].values]

I would like to see better performance.


回答1:


It seems like you want Series.rank():

x.loc[:, 'pcta'] = x.rank(pct=True) # will be in decimal form

Performance:

import scipy.stats as scs

%timeit [scs.percentileofscore(x["a"].values, i) for i in x["a"].values]
1000 loops, best of 3: 877 µs per loop

%timeit x.rank(pct=True)
10000 loops, best of 3: 107 µs per loop


来源:https://stackoverflow.com/questions/44211653/calculate-percentile-for-every-value-in-a-column-of-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!