Retrieve indices of NaN values in a pandas dataframe

前端 未结 4 1652
粉色の甜心
粉色の甜心 2020-12-18 03:50

I try to retrieve for each row containing NaN values all the indices of the corresponding columns.

d=[[11.4,1.3,2.0, NaN],[11.4,1.3,NaN, NaN],[11.4,1.3,2.8,          


        
相关标签:
4条回答
  • 2020-12-18 04:02

    It should be efficient to use a scipy coordinate-format sparse matrix to retrieve the coordinates of the null values:

    import scipy.sparse as sp
    
    x,y = sp.coo_matrix(df.isnull()).nonzero()
    print(list(zip(x,y)))
    
    [(0, 3), (1, 2), (1, 3), (3, 0), (3, 1)]
    

    Note that I'm calling the nonzero method in order to just output the coordinates of the nonzero entries in the underlying sparse matrix since I don't care about the actual values which are all True.

    0 讨论(0)
  • 2020-12-18 04:12

    Another way, extract the rows which are NaN:

    In [11]: df_null = df.isnull().unstack()
    
    In [12]: t = df_null[df_null]
    
    In [13]: t
    Out[13]:
    A  3    True
    B  3    True
    C  1    True
    D  0    True
       1    True
    dtype: bool
    

    This gets you most of the way and may be enough.
    Although it may be easier to work with the Series:

    In [14]: s = pd.Series(t2.index.get_level_values(1), t2.index.get_level_values(0))
    
    In [15]: s
    Out[15]:
    0    D
    1    C
    1    D
    3    A
    3    B
    dtype: object
    

    e.g. if you wanted the lists (though I don't think you would need them)

    In [16]: s.groupby(level=0).apply(list)
    Out[16]:
    0       [D]
    1    [C, D]
    3    [A, B]
    dtype: object
    
    0 讨论(0)
  • 2020-12-18 04:23

    another simpler way is:

    >>>df.isnull().any(axis=1)
    0     True
    1     True
    2    False
    3     True
    dtype: bool
    

    to subset:

    >>> bool_idx = df.isnull().any(axis=1)
    >>> df[bool_idx]
        A         B     C    D
    0   11.4    1.3     2.0  NaN
    1   11.4    1.3     NaN  NaN
    3   NaN      NaN    2.8  0.7
    

    to get integer index:

    >>> df[bool_idx].index
    Int64Index([0, 1, 3], dtype='int64')
    
    0 讨论(0)
  • 2020-12-18 04:24

    You can iterate through each row in the dataframe, create a mask of null values, and output their index (i.e. the columns in the dataframe).

    lst = []
    for _, row in df.iterrows():
        mask = row.isnull()
        lst += [row[mask].index.tolist()]
    
    >>> lst
    [['D'], ['C', 'D'], [], ['A', 'B']]
    
    0 讨论(0)
提交回复
热议问题