numpy.average()
has a weights option, but numpy.std()
does not. Does anyone have suggestions for a workaround?
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
How about the following short "manual calculation"?
def weighted_avg_and_std(values, weights): """ Return the weighted average and standard deviation. values, weights -- Numpy ndarrays with the same shape. """ average = numpy.average(values, weights=weights) # Fast and numerically precise: variance = numpy.average((values-average)**2, weights=weights) return (average, math.sqrt(variance))
回答2:
There is a class in statsmodels
to calculate weighted statistics: statsmodels.stats.weightstats.DescrStatsW
:
from statsmodels.stats.weightstats import DescrStatsW array = np.array([1,2,1,2,1,2,1,3]) weights = np.ones_like(array) weights[3] = 100 weighted_stats = DescrStatsW(array, weights=weights, ddof=0) weighted_stats.mean # weighted mean of data (equivalent to np.average(array, weights=weights)) # 1.97196261682243 weighted_stats.std # standard deviation with default degrees of freedom correction # 0.21434289609681711 weighted_stats.std_mean # standard deviation of weighted mean # 0.020818822467555047 weighted_stats.var # variance with default degrees of freedom correction # 0.045942877107170932
The nice feature of this class is that if you want to calculate different statistical properties subsequent calls will be very fast because already calculated (even intermediate) results are cached.
回答3:
There doesn't appear to be such a function in numpy/scipy yet, but there is a ticket proposing this added functionality. Included there you will find Statistics.py which implements weighted standard deviations.
回答4:
There is a very good example proposed by gaborous:
import pandas as pd import numpy as np # X is the dataset, as a Pandas' DataFrame mean = mean = np.ma.average(X, axis=0, weights=weights) # Computing the weighted sample mean (fast, efficient and precise) # Convert to a Pandas' Series (it's just aesthetic and more # ergonomic; no difference in computed values) mean = pd.Series(mean, index=list(X.keys())) xm = X-mean # xm = X diff to mean xm = xm.fillna(0) # fill NaN with 0 (because anyway a variance of 0 is just void, but at least it keeps the other covariance's values computed correctly)) sigma2 = 1./(w.sum()-1) * xm.mul(w, axis=0).T.dot(xm); # Compute the unbiased weighted sample covariance
Correct equation for weighted unbiased sample covariance, URL (version: 2016-06-28)