Find empty or NaN entry in Pandas Dataframe

女生的网名这么多〃 提交于 2019-12-18 01:59:00

问题


I am trying to search through a Pandas Dataframe to find where it has a missing entry or a NaN entry.

Here is a dataframe that I am working with:

cl_id            a           c         d         e        A1              A2             A3
    0            1   -0.419279  0.843832 -0.530827    text76        1.537177      -0.271042
    1            2    0.581566  2.257544  0.440485    dafN_6        0.144228       2.362259
    2            3   -1.259333  1.074986  1.834653    system                       1.100353
    3            4   -1.279785  0.272977  0.197011     Fifty       -0.031721       1.434273
    4            5    0.578348  0.595515  0.553483   channel        0.640708       0.649132
    5            6   -1.549588 -0.198588  0.373476     audio       -0.508501               
    6            7    0.172863  1.874987  1.405923    Twenty             NaN            NaN
    7            8   -0.149630 -0.502117  0.315323  file_max             NaN            NaN

NOTE: The blank entries are empty strings - this is because there was no alphanumeric content in the file that the dataframe came from.

If I have this dataframe, how can I find a list with the indexes where the NaN or blank entry occurs?


回答1:


np.where(pd.isnull(df)) returns the row and column indices where the value is NaN:

In [152]: import numpy as np
In [153]: import pandas as pd
In [154]: np.where(pd.isnull(df))
Out[154]: (array([2, 5, 6, 6, 7, 7]), array([7, 7, 6, 7, 6, 7]))

In [155]: df.iloc[2,7]
Out[155]: nan

In [160]: [df.iloc[i,j] for i,j in zip(*np.where(pd.isnull(df)))]
Out[160]: [nan, nan, nan, nan, nan, nan]

Finding values which are empty strings could be done with applymap:

In [182]: np.where(df.applymap(lambda x: x == ''))
Out[182]: (array([5]), array([7]))

Note that using applymap requires calling a Python function once for each cell of the DataFrame. That could be slow for a large DataFrame, so it would be better if you could arrange for all the blank cells to contain NaN instead so you could use pd.isnull.




回答2:


Try this:

df[df['column_name'] == ''].index

and for NaNs you can try:

pd.isna(df['column_name'])



回答3:


Partial solution: for a single string column tmp = df['A1'].fillna(''); isEmpty = tmp=='' gives boolean Series of True where there are empty strings or NaN values.




回答4:


I've resorted to

df[ (df[column_name].notnull()) & (df[column_name]!=u'') ].index

lately. That gets both null and empty-string cells in one go.




回答5:


To obtain all the rows that contains an empty cell in in a particular column.

DF_new_row=DF_raw.loc[DF_raw['columnname']=='']

This will give the subset of DF_raw, which satisfy the checking condition.



来源:https://stackoverflow.com/questions/27159189/find-empty-or-nan-entry-in-pandas-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!