可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I am trying to calculate the running median, mean and std of a large array. I know how to calculate the running mean as below:
def running_mean(x, N): cumsum = np.cumsum(np.insert(x, 0, 0)) return (cumsum[N:] - cumsum[:-N]) / float(N)
This works very efficiently. But I do not quite understand why (cumsum[N:] - cumsum[:-N]) / float(N)
can give the mean value (I borrowed from someome else).
I tried to add another return sentence to calculate the median, but it does not do what I want.
return (cumsum[N:] - cumsum[:-N]) / float(N), np.median(cumsum[N:] - cumsum[:-N])
Does anyone offer me some hint to approach this problem? Thank you very much.
Huanian Zhang
回答1:
That cumsum
trick is specific to finding sum
or average
values and don't think you can extend it simply to get median
and std
values. One approach to perform a generic ufunc
operation in a sliding/running window on a 1D
array would be to create a series of 1D sliding windows-based indices stacked as a 2D array and then apply the ufunc
along the stacking axis. For getting those indices, you can use broadcasting
.
Thus, for performing running mean, it would look like this -
idx = np.arange(N) + np.arange(len(x)-N+1)[:,None] out = np.mean(x[idx],axis=1)
For running median
and std
, just replace np.mean
with np.median
and np.std
respectively.
回答2:
In order to estimate mean and standard deviation of a given sample set there exists incremental algorithms (std, mean) which helps you to keep the computational load low and do it online estimation. The computation of the median applies sorting. You can approximate the median. Let x(t) be your data at a given time t,m(t) the median of time t, m(t-1) the median value befor an e a small number e.g. e = 0.001 than
m(t) = m(t-1) + e, if m(t-1) < x(t)
m(t) = m(t-1) - e, if m(t-1) > x(t)
m(t) = m(t), else
If you have a good inital guess of the median m(0) this works well. e should be choosen in relation to your values range and how many samples expect. E.g. if x = [-4 2 7.5 2], e = 0.05 would be good, if x = [1000 , 3153, -586, -29], e = 10.