How to sort a numpy array with key as isnan?

前端 未结 3 1692
遥遥无期
遥遥无期 2020-12-21 16:26

I have a numpy array like

np.array([[1.0, np.nan, 5.0, 1, True, True, np.nan, True],
       [np.nan, 4.0, 7.0, 2, True, np.nan, False, True],
       [2.0, 5         


        
相关标签:
3条回答
  • 2020-12-21 17:03

    You can't do this with an object array and nan You would need to find a numeric type everything would fit into. When used as an object instead of as a float, nan returns false for <, >, and ==.

    Additionally, True and False are equivalent to 0 and 1, so I don't think there is any way to get your expected result.

    You would have to see if converting the dtype to float would give you proper results for your use case.

    0 讨论(0)
  • 2020-12-21 17:12

    Approach #1

    Here's a vectorized approach borrowing the concept of masking from this post -

    def mask_app(a):
        out = np.empty_like(a)
        mask = np.isnan(a.astype(float))
        mask_sorted = np.sort(mask,1)
        out[mask_sorted] = a[mask]
        out[~mask_sorted] = a[~mask]
        return out
    

    Sample run -

    # Input dataframe
    In [114]: data
    Out[114]: 
       ID_1  ID_2  ID_3  Key    Var  Var_1  Var_2 Var_3
    0   1.0   NaN   5.0    1   True   True    NaN  True
    1   NaN   4.0   7.0    2   True    NaN  False  True
    2   2.0   5.0   NaN    3  False  False   True   NaN
    
    # Use pandas approach for verification    
    In [115]: data.apply(lambda x : sorted(x,key=pd.isnull),1).values
    Out[115]: 
    array([[1.0, 5.0, 1, True, True, True, nan, nan],
           [4.0, 7.0, 2, True, False, True, nan, nan],
           [2.0, 5.0, 3, False, False, True, nan, nan]], dtype=object)
    
    # Use proposed approach and verify
    In [116]: mask_app(data.values)
    Out[116]: 
    array([[1.0, 5.0, 1, True, True, True, nan, nan],
           [4.0, 7.0, 2, True, False, True, nan, nan],
           [2.0, 5.0, 3, False, False, True, nan, nan]], dtype=object)
    

    Approach #2

    With few more modifications, a simplified version with the idea from this post -

    def mask_app2(a):
        out = np.full(a.shape,np.nan,dtype=a.dtype)
        mask = ~np.isnan(a.astype(float))
        out[np.sort(mask,1)[:,::-1]] = a[mask]
        return out
    
    0 讨论(0)
  • 2020-12-21 17:12

    Since you have an object array anyway, do the sorting in Python, then make your array. You can write a key that does something like this:

    from math import isnan
    
    def key(x):
        if isnan(x):
            t = 3
            x = 0
        elif isinstance(x, bool):
            t = 2
        else:
            t = 1
        return t, x
    

    This key returns a two-element tuple, where the first element gives the preliminary ordering by type. It considers all NaNs to be equal and greater than any other type.

    Even if you start with data in a DataFrame, you can do something like:

    values = [list(sorted(row, key=key)) for row in data.values]
    values = np.array(values, dtype=np.object)
    

    You can replace the list comprehension with np.apply_along_axis if that suits your needs better:

    values = np.apply_along_axis(lambda row: np.array(list(sorted(row, key=key))),
                                 axis=1, arr=data.values)
    
    0 讨论(0)
提交回复
热议问题