I\'m fairly new to python and pandas (from using SAS as my workhorse analytical platform), so I apologize in advance if this has already been asked / answered. (I\'ve search
There is statistics and econometrics library (statsmodels) that appears to handle this. Here's an example that extends @MSeifert's answer here on a similar question.
df=pd.DataFrame({ 'x':range(1,101), 'wt':range(1,101) })
from statsmodels.stats.weightstats import DescrStatsW
wdf = DescrStatsW(df.x, weights=df.wt, ddof=1)
print( wdf.mean )
print( wdf.std )
print( wdf.quantile([0.25,0.50,0.75]) )
67.0
23.6877840059
p
0.25 50
0.50 71
0.75 87
I don't use SAS, but this gives the same answer as the stata command:
sum x [fw=wt], detail
Stata actually has a few weight options and in this case gives a slightly different answer if you specify aw
(analytical weights) instead of fw
(frequency weights). Also, stata requires fw
to be an integer whereas DescrStatsW
allows non-integer weights. Weights are more complicated than you'd think... This is starting to get into the weeds, but there is a great discussion of weighting issues for calculating the standard deviation here.
Also note that DescrStatsW
does not appear to include functions for min and max, but as long as your weights are non-zero this should not be a problem as the weights don't affect the min and max. However, if you did have some zero weights, it might be nice to have weighted min and max, but it's also easy to calculate in pandas:
df.x[ df.wt > 0 ].min()
df.x[ df.wt > 0 ].max()