Keeping NaNs with pandas dataframe inequalities

前端 未结 3 749
广开言路
广开言路 2020-12-17 21:39

I have a pandas.DataFrame object that contains about 100 columns and 200000 rows of data. I am trying to convert it to a bool dataframe where True means that the value is gr

3条回答
  •  悲哀的现实
    2020-12-17 22:14

    You can do:

    new_df = df >= threshold
    new_df[df.isnull()] = np.NaN
    

    But that is different from what you will get using the apply method. Here your mask has float dtype containing NaN, 0.0 and 1.0. In the apply solution you get object dtype with NaN, False, and True.

    Neither are OK to be used as a mask because you might not get what you want. IEEE says that any NaN comparison must yield False and the apply method is implicitly violates that by returning NaN!

    The best option is to keep track of the NaNs separately and df.isnull() is quite fast when bottleneck is installed.

提交回复
热议问题