How to replace all non-NaN entries of a dataframe with 1 and all NaN with 0

后端未结

关注

 9  1096

轻奢々 2021-02-01 18:07

I have a dataframe with 71 columns and 30597 rows. I want to replace all non-nan entries with 1 and the nan values with 0.

Initially I tried for-loop on each value of th

9条回答

情深已故 (楼主)

2021-02-01 18:43
I do a lot of data analysis and am interested in finding new/faster methods of carrying out operations. I had never come across jezrael's method, so I was curious to compare it with my usual method (i.e. replace by indexing). NOTE: This is not an answer to the OP's question, rather it is an illustration of the efficiency of jezrael's method. Since this is NOT an answer I will remove this post if people do not find it useful (and after being downvoted into oblivion!). Just leave a comment if you think I should remove it.

I created a moderately sized dataframe and did multiple replacements using both the df.notnull().astype(int) method and simple indexing (how I would normally do this). It turns out that the latter is slower by approximately five times. Just an fyi for anyone doing larger-scale replacements.
```
from __future__ import division, print_function

import numpy as np
import pandas as pd
import datetime as dt


# create dataframe with randomly place NaN's
data = np.ones( (1e2,1e2) )
data.ravel()[np.random.choice(data.size,data.size/10,replace=False)] = np.nan

df = pd.DataFrame(data=data)

trials = np.arange(100)


d1 = dt.datetime.now()

for r in trials:
    new_df = df.notnull().astype(int)

print( (dt.datetime.now()-d1).total_seconds()/trials.size )


# create a dummy copy of df.  I use a dummy copy here to prevent biasing the 
# time trial with dataframe copies/creations within the upcoming loop
df_dummy = df.copy()

d1 = dt.datetime.now()

for r in trials:
    df_dummy[df.isnull()] = 0
    df_dummy[df.isnull()==False] = 1

print( (dt.datetime.now()-d1).total_seconds()/trials.size )
```
This yields times of 0.142 s and 0.685 s respectively. It is clear who the winner is.
0 讨论(0)

查看其它9个回答
发布评论:

提交评论
- 加载中...