Taking np.average while ignoring NaN's?

匿名 (未验证) 提交于 2019-12-03 08:44:33

问题:

I have a matrix with shape (64,17) correspond to time & latitude. I want to take a weighted latitude average, which I know np.average can do because, unlike np.nanmean, which I used to average the longitudes, weights can be used in the arguments. However, np.average doesn't ignore NaN like np.nanmean does, so my first 5 entries of each row are included in the latitude averaging and make the entire time series full of NaN.

Is there a way I can take a weighted average without the NaN's being included in the calculation?

file = Dataset("sst_aso_1951-2014latlon_seasavgs.nc") sst = file.variables['sst'] lat = file.variables['lat']  sst_filt = np.asarray(sst) missing_values_indices = sst_filt < -8000000   #missing values have value -infinity sst_filt[missing_values_indices] = np.nan      #all missing values set to NaN  weights = np.cos(np.deg2rad(lat)) sst_zonalavg = np.nanmean(sst_filt, axis=2) print sst_zonalavg[0,:] sst_ts = np.average(sst_zonalavg, axis=1, weights=weights) print sst_ts[:] 

Output:

[ nan nan nan nan nan  27.08499908 27.33333397 28.1457119 28.32899857 28.34454346  28.27285767 28.18571472 28.10199928 28.10812378 28.03411865  28.06411552 28.16529465]  [ nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan  nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan  nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan  nan nan nan nan] 

回答1:

You can create a masked array like this:

data = np.array([[1,2,3], [4,5,np.NaN], [np.NaN,6,np.NaN], [0,0,0]]) cleaned_data = np.ma.masked_array(data,np.isnan(dat)) #calculate your weighted average here instead weights=[1,1,1] average = np.ma.average(cleaned_data,axis=1,weights=weights) #this gives you the result print average.filled(np.nan) 

This outputs:

[ 2.   4.5  6.   0. ] 


回答2:

You can simply multiply the input array with the weights and sum along the specified axis ignoring NaNs with np.nansum. Thus, for your case, assuming the weights are to be used along axis = 1 on the input array sst_filt, it would be -

np.nansum(sst_filt*weights,axis=1) 

For a generic case, a function could be defined as follows -

def nanaverage(A,weights,axis):     return np.nansum(A*weights,axis=axis) 

Sample run -

In [200]: sst_filt  # 2D array case Out[200]:  array([[  0.,   1.],        [ nan,   3.],        [  4.,   5.]])  In [201]: weights Out[201]: array([ 0.25,  0.75])  In [202]: nanaverage(sst_filt,weights=weights,axis=1) Out[202]: array([ 0.75,  2.25,  4.75]) 


回答3:

I'd probably just select the portion of the array that isn't NaN and then use those indices to select the weights too.

For example:

import numpy as np data = np.random.rand(10) weights = np.random.rand(10) data[[2, 4, 8]] = np.nan  print data # [ 0.32849204,  0.90310062,         nan,  0.58580299,         nan, #    0.934721  ,  0.44412978,  0.78804409,         nan,  0.24942098]  ii = ~np.isnan(data) print ii # [ True  True False  True False  True  True  True False  True]  result = np.average(data[ii], weights = weights[ii]) print result # .6470319 

Edit: I realized this won't work with two dimensional arrays. In that case, I'd probably just set the values and weights to zero for the NaNs. This yields the same result as if those indices were just not included in the calculation.

Before running np.average:

data[np.isnan(data)] = 0; weights[np.isnan(data)] = 0; result = np.average(data, weights=weights) 

Or create copies if you want to keep track of which indices were NaN.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!