Replaces spaces with NAN in pandas dataframe

纵饮孤独 提交于 2019-12-24 08:49:59

问题


I have a pandas dataframe with three columns:

Name        Name2           DateTime
                            2016-06-10 05:22
                            2016-06-10 05:23
                            2016-06-10 14:25
Guest       Guest1          2016-06-10 15:32

I have to replace empty spaces with NAN. So AccountName and AccountName2 of rows 1,2,3 and 4 should be NAN. I tried the below statement:

 df3['Name'] = df3['Name'].replace(r'[^\s+]',np.nan, regex=True)

But since I have white spaces after "Guest " in Name, all 5 rows get replaced with NAN.

Edit:

This is our actual data.

Name              Name2                  DateTime
\t\t-\r\n\t      \t\t-\r\n\t            2016-06-10 05:22
\t\t-\r\n\t      \t\t-\r\n\t            2016-06-10 05:23
\t\t-\r\n\t      \t\t-\r\n\t            2016-06-10 14:25
\t\tGuest\r\n\t  \t\tGuest1\r\n\t       2016-06-10 15:32

I used this to remove those escape characters.

df['Name'] = df['Name'].str.replace('\r','').str.replace('\t','').str.replace('\n','').str.replace('-','')

After removing those characters, I am not sure what gets inserted in that place now.


回答1:


Another solution which found length of data and then by boolean indexing replacing all data with length 0 or 1:

print (df.applymap(len))
   Name  Name2  DateTime
0     0      0        16
1     0      0        16
2     0      0        16
3     5      6        16

df[df.applymap(len) < 2] = np.nan
print (df)
    Name   Name2          DateTime
0    NaN     NaN  2016-06-10 05:22
1    NaN     NaN  2016-06-10 05:23
2    NaN     NaN  2016-06-10 14:25
3  Guest  Guest1  2016-06-10 15:32



回答2:


Since you dont need the spaces, this should work.

df3['Name'] = df3['Name'].replace('[\s]+',"", regex=True) #Gets rid of all blank spaces
df3['Name'] = df3['Name'].replace('',np.nan, regex=True) #Replace completely empty cells with nan



回答3:


If you're sure there won't be spaces between the names, you could use .strip() to remove any whitespace (or anything, really) off the ends of the string.

 df3['Name'] = df3['Name'].strip().replace(r'[^\s+]',np.nan, regex=True)


来源:https://stackoverflow.com/questions/37837148/replaces-spaces-with-nan-in-pandas-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!