可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

numpy.average() has a weights option, but numpy.std() does not. Does anyone have suggestions for a workaround?

回答1:

How about the following short "manual calculation"?

def weighted_avg_and_std(values, weights):     """     Return the weighted average and standard deviation.      values, weights -- Numpy ndarrays with the same shape.     """     average = numpy.average(values, weights=weights)     # Fast and numerically precise:     variance = numpy.average((values-average)**2, weights=weights)     return (average, math.sqrt(variance))

回答2:

There is a class in statsmodels to calculate weighted statistics: statsmodels.stats.weightstats.DescrStatsW:

from statsmodels.stats.weightstats import DescrStatsW  array = np.array([1,2,1,2,1,2,1,3]) weights = np.ones_like(array) weights[3] = 100  weighted_stats = DescrStatsW(array, weights=weights, ddof=0)  weighted_stats.mean      # weighted mean of data (equivalent to np.average(array, weights=weights)) # 1.97196261682243  weighted_stats.std       # standard deviation with default degrees of freedom correction # 0.21434289609681711  weighted_stats.std_mean  # standard deviation of weighted mean # 0.020818822467555047  weighted_stats.var       # variance with default degrees of freedom correction # 0.045942877107170932

The nice feature of this class is that if you want to calculate different statistical properties subsequent calls will be very fast because already calculated (even intermediate) results are cached.

回答3:

There doesn't appear to be such a function in numpy/scipy yet, but there is a ticket proposing this added functionality. Included there you will find Statistics.py which implements weighted standard deviations.

回答4:

There is a very good example proposed by gaborous:

import pandas as pd import numpy as np # X is the dataset, as a Pandas' DataFrame mean = mean = np.ma.average(X, axis=0, weights=weights) # Computing the  weighted sample mean (fast, efficient and precise)  # Convert to a Pandas' Series (it's just aesthetic and more  # ergonomic; no difference in computed values) mean = pd.Series(mean, index=list(X.keys()))  xm = X-mean # xm = X diff to mean xm = xm.fillna(0) # fill NaN with 0 (because anyway a variance of 0 is  just void, but at least it keeps the other covariance's values computed  correctly)) sigma2 = 1./(w.sum()-1) * xm.mul(w, axis=0).T.dot(xm); # Compute the  unbiased weighted sample covariance

Correct equation for weighted unbiased sample covariance, URL (version: 2016-06-28)

文章来源: Weighted standard deviation in NumPy?

标签

array

mean