How to handle NaNs in binning with numpy add.reduceat?

问题

I'm using the numpy reduceat method for binning data. Background: I'm processing measurement data sampled at high frequencies and I need to down-sample them by extracting bin means from bins of a certain size. Since I have millions of samples, I need something fast. In principle, this works like a charm:

import numpy as np
def bin_by_npreduceat(v, nbins):
    bins = np.linspace(0, len(v), nbins+1, True).astype(np.int)
    return np.add.reduceat(v, bins[:-1]) / np.diff(bins)

The Problem is: NaNs can occur (rarely but it happens). Consequence: the whole bin will be NaN since I use np.add:

v = np.array([1,np.nan,3,4,5,4,3,5,6,7,3,2,5,6,9])
bin_by_npreduceat(v, 3)
Out[110]: array([nan,  5.,  5.])

Anybody know how I can fix this? np.nansum unfortunately has no reduceat...

回答1:

We can use a masking based method -

# Mask of NaNs
mask = np.isnan(v)

# Replace NaNs with zeros
vn = np.where(mask,0,v)

# Use add.reduceat on NaNs skipped array to get summations
# Use add.reduceat on the mask to get valid counts
# Divide them to get final output
out = np.add.reduceat(vn, bins[:-1])/np.add.reduceat(~mask, bins[:-1])

来源：https://stackoverflow.com/questions/57160558/how-to-handle-nans-in-binning-with-numpy-add-reduceat

标签

python

numpy

binning

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!