I have a numpy array like
np.array([[1.0, np.nan, 5.0, 1, True, True, np.nan, True],
[np.nan, 4.0, 7.0, 2, True, np.nan, False, True],
[2.0, 5
You can't do this with an object
array and nan
You would need to find a numeric type everything would fit into. When used as an object instead of as a float, nan
returns false for <
, >
, and ==
.
Additionally, True
and False
are equivalent to 0 and 1, so I don't think there is any way to get your expected result.
You would have to see if converting the dtype
to float
would give you proper results for your use case.
Approach #1
Here's a vectorized approach borrowing the concept of masking
from this post -
def mask_app(a):
out = np.empty_like(a)
mask = np.isnan(a.astype(float))
mask_sorted = np.sort(mask,1)
out[mask_sorted] = a[mask]
out[~mask_sorted] = a[~mask]
return out
Sample run -
# Input dataframe
In [114]: data
Out[114]:
ID_1 ID_2 ID_3 Key Var Var_1 Var_2 Var_3
0 1.0 NaN 5.0 1 True True NaN True
1 NaN 4.0 7.0 2 True NaN False True
2 2.0 5.0 NaN 3 False False True NaN
# Use pandas approach for verification
In [115]: data.apply(lambda x : sorted(x,key=pd.isnull),1).values
Out[115]:
array([[1.0, 5.0, 1, True, True, True, nan, nan],
[4.0, 7.0, 2, True, False, True, nan, nan],
[2.0, 5.0, 3, False, False, True, nan, nan]], dtype=object)
# Use proposed approach and verify
In [116]: mask_app(data.values)
Out[116]:
array([[1.0, 5.0, 1, True, True, True, nan, nan],
[4.0, 7.0, 2, True, False, True, nan, nan],
[2.0, 5.0, 3, False, False, True, nan, nan]], dtype=object)
Approach #2
With few more modifications, a simplified version with the idea from this post -
def mask_app2(a):
out = np.full(a.shape,np.nan,dtype=a.dtype)
mask = ~np.isnan(a.astype(float))
out[np.sort(mask,1)[:,::-1]] = a[mask]
return out
Since you have an object array anyway, do the sorting in Python, then make your array. You can write a key that does something like this:
from math import isnan
def key(x):
if isnan(x):
t = 3
x = 0
elif isinstance(x, bool):
t = 2
else:
t = 1
return t, x
This key returns a two-element tuple, where the first element gives the preliminary ordering by type. It considers all NaNs to be equal and greater than any other type.
Even if you start with data in a DataFrame
, you can do something like:
values = [list(sorted(row, key=key)) for row in data.values]
values = np.array(values, dtype=np.object)
You can replace the list comprehension with np.apply_along_axis if that suits your needs better:
values = np.apply_along_axis(lambda row: np.array(list(sorted(row, key=key))),
axis=1, arr=data.values)