Set values in numpy array to NaN by index

后端 未结 2 1348
走了就别回头了
走了就别回头了 2020-12-19 22:36

I want to set specific values in a numpy array to NaN (to exclude them from a row-wise mean calculation).

I tried

import numpy

x = nump         


        
2条回答
  •  醉话见心
    2020-12-19 22:58

    Vectorized approach to set appropriate elements as NaNs

    @unutbu's solution must get rid of the value error you were getting. If you are looking to vectorize for performance, you can use boolean indexing like so -

    import numpy as np
    
    # Create mask of positions in x (with float datatype) where NaNs are to be put
    mask = np.asarray(cutoff)[:,None] > np.arange(x.shape[1])
    
    # Put NaNs into masked region of x for the desired ouput
    x[mask] = np.nan
    

    Sample run -

    In [92]: x = np.random.randint(0,9,(4,7)).astype(float)
    
    In [93]: x
    Out[93]: 
    array([[ 2.,  1.,  5.,  2.,  5.,  2.,  1.],
           [ 2.,  5.,  7.,  1.,  5.,  4.,  8.],
           [ 1.,  1.,  7.,  4.,  8.,  3.,  1.],
           [ 5.,  8.,  7.,  5.,  0.,  2.,  1.]])
    
    In [94]: cutoff = [5,3,0,6]
    
    In [95]: x[np.asarray(cutoff)[:,None] > np.arange(x.shape[1])] = np.nan
    
    In [96]: x
    Out[96]: 
    array([[ nan,  nan,  nan,  nan,  nan,   2.,   1.],
           [ nan,  nan,  nan,   1.,   5.,   4.,   8.],
           [  1.,   1.,   7.,   4.,   8.,   3.,   1.],
           [ nan,  nan,  nan,  nan,  nan,  nan,   1.]])
    

    Vectorized approach to directly calculate row-wise mean of appropriate elements

    If you were trying to get the masked mean values, you can modify the earlier proposed vectorized approach to avoid dealing with NaNs altogether and more importantly keep x with integer values. Here's the modified approach -

    # Get array version of cutoff
    cutoff_arr = np.asarray(cutoff)
    
    # Mask of positions in x which are to be considered for row-wise mean calculations
    mask1 = cutoff_arr[:,None] <= np.arange(x.shape[1])
    
    # Mask x, calculate the corresponding sum and thus mean values for each row
    masked_mean_vals = (mask1*x).sum(1)/(x.shape[1] -  cutoff_arr)
    

    Here's a sample run for such a solution -

    In [61]: x = np.random.randint(0,9,(4,7))
    
    In [62]: x
    Out[62]: 
    array([[5, 0, 1, 2, 4, 2, 0],
           [3, 2, 0, 7, 5, 0, 2],
           [7, 2, 2, 3, 3, 2, 3],
           [4, 1, 2, 1, 4, 6, 8]])
    
    In [63]: cutoff = [5,3,0,6]
    
    In [64]: cutoff_arr = np.asarray(cutoff)
    
    In [65]: mask1 = cutoff_arr[:,None] <= np.arange(x.shape[1])
    
    In [66]: mask1
    Out[66]: 
    array([[False, False, False, False, False,  True,  True],
           [False, False, False,  True,  True,  True,  True],
           [ True,  True,  True,  True,  True,  True,  True],
           [False, False, False, False, False, False,  True]], dtype=bool)
    
    In [67]: masked_mean_vals = (mask1*x).sum(1)/(x.shape[1] -  cutoff_arr)
    
    In [68]: masked_mean_vals
    Out[68]: array([ 1.        ,  3.5       ,  3.14285714,  8.        ])
    

提交回复
热议问题