Weighted standard deviation in NumPy?

匿名 (未验证) 提交于 2019-12-03 02:08:02

问题:

numpy.average() has a weights option, but numpy.std() does not. Does anyone have suggestions for a workaround?

回答1:

How about the following short "manual calculation"?

def weighted_avg_and_std(values, weights):     """     Return the weighted average and standard deviation.      values, weights -- Numpy ndarrays with the same shape.     """     average = numpy.average(values, weights=weights)     # Fast and numerically precise:     variance = numpy.average((values-average)**2, weights=weights)     return (average, math.sqrt(variance)) 


回答2:

There is a class in statsmodels to calculate weighted statistics: statsmodels.stats.weightstats.DescrStatsW:

from statsmodels.stats.weightstats import DescrStatsW  array = np.array([1,2,1,2,1,2,1,3]) weights = np.ones_like(array) weights[3] = 100  weighted_stats = DescrStatsW(array, weights=weights, ddof=0)  weighted_stats.mean      # weighted mean of data (equivalent to np.average(array, weights=weights)) # 1.97196261682243  weighted_stats.std       # standard deviation with default degrees of freedom correction # 0.21434289609681711  weighted_stats.std_mean  # standard deviation of weighted mean # 0.020818822467555047  weighted_stats.var       # variance with default degrees of freedom correction # 0.045942877107170932 

The nice feature of this class is that if you want to calculate different statistical properties subsequent calls will be very fast because already calculated (even intermediate) results are cached.



回答3:

There doesn't appear to be such a function in numpy/scipy yet, but there is a ticket proposing this added functionality. Included there you will find Statistics.py which implements weighted standard deviations.



回答4:

There is a very good example proposed by gaborous:

import pandas as pd import numpy as np # X is the dataset, as a Pandas' DataFrame mean = mean = np.ma.average(X, axis=0, weights=weights) # Computing the  weighted sample mean (fast, efficient and precise)  # Convert to a Pandas' Series (it's just aesthetic and more  # ergonomic; no difference in computed values) mean = pd.Series(mean, index=list(X.keys()))  xm = X-mean # xm = X diff to mean xm = xm.fillna(0) # fill NaN with 0 (because anyway a variance of 0 is  just void, but at least it keeps the other covariance's values computed  correctly)) sigma2 = 1./(w.sum()-1) * xm.mul(w, axis=0).T.dot(xm); # Compute the  unbiased weighted sample covariance 

Correct equation for weighted unbiased sample covariance, URL (version: 2016-06-28)



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!