问题
I have a pandas dataframe with three columns:
Name Name2 DateTime
2016-06-10 05:22
2016-06-10 05:23
2016-06-10 14:25
Guest Guest1 2016-06-10 15:32
I have to replace empty spaces with NAN. So AccountName and AccountName2 of rows 1,2,3 and 4 should be NAN. I tried the below statement:
df3['Name'] = df3['Name'].replace(r'[^\s+]',np.nan, regex=True)
But since I have white spaces after "Guest " in Name, all 5 rows get replaced with NAN.
Edit:
This is our actual data.
Name Name2 DateTime
\t\t-\r\n\t \t\t-\r\n\t 2016-06-10 05:22
\t\t-\r\n\t \t\t-\r\n\t 2016-06-10 05:23
\t\t-\r\n\t \t\t-\r\n\t 2016-06-10 14:25
\t\tGuest\r\n\t \t\tGuest1\r\n\t 2016-06-10 15:32
I used this to remove those escape characters.
df['Name'] = df['Name'].str.replace('\r','').str.replace('\t','').str.replace('\n','').str.replace('-','')
After removing those characters, I am not sure what gets inserted in that place now.
回答1:
Another solution which found length of data and then by boolean indexing replacing all data with length 0 or 1:
print (df.applymap(len))
Name Name2 DateTime
0 0 0 16
1 0 0 16
2 0 0 16
3 5 6 16
df[df.applymap(len) < 2] = np.nan
print (df)
Name Name2 DateTime
0 NaN NaN 2016-06-10 05:22
1 NaN NaN 2016-06-10 05:23
2 NaN NaN 2016-06-10 14:25
3 Guest Guest1 2016-06-10 15:32
回答2:
Since you dont need the spaces, this should work.
df3['Name'] = df3['Name'].replace('[\s]+',"", regex=True) #Gets rid of all blank spaces
df3['Name'] = df3['Name'].replace('',np.nan, regex=True) #Replace completely empty cells with nan
回答3:
If you're sure there won't be spaces between the names, you could use .strip() to remove any whitespace (or anything, really) off the ends of the string.
df3['Name'] = df3['Name'].strip().replace(r'[^\s+]',np.nan, regex=True)
来源:https://stackoverflow.com/questions/37837148/replaces-spaces-with-nan-in-pandas-dataframe