pandas dataframe replace blanks with NaN

Deadly 提交于 2019-12-07 00:52:00

问题


I have a dataframe with empty cells and would like to replace these empty cells with NaN. A solution previously proposed at this forum works, but only if the cell contains a space:

df.replace(r'\s+',np.nan,regex=True)

This code does not work when the cell is empty. Has anyone a suggestion for a panda code to replace empty cells.

Wannes


回答1:


I think the easiest thing here is to do the replace twice:

In [117]:
df = pd.DataFrame({'a':['',' ','asasd']})
df

Out[117]:
       a
0       
1       
2  asasd

In [118]:
df.replace(r'\s+',np.nan,regex=True).replace('',np.nan)

Out[118]:
       a
0    NaN
1    NaN
2  asasd



回答2:


Both other answers do not take in account all characters in a string. This is better:

df.replace(r'\s+( +\.)|#',np.nan,regex=True).replace('',np.nan))

More docs on: Replacing blank values (white space) with NaN in pandas




回答3:


How about this?

df.replace(r'\s+|^$', np.nan, regex=True)



回答4:


As you've already seen, if you do the obvious thing and replace() with None it throws an error:

df.replace('', None)
TypeError: cannot replace [''] with method pad on a DataFrame

The solution seems to be to simply replace the empty string with numpy's NaN.

import numpy as np
df.replace('', np.NaN)

While I'm not 100% sure that pd.NaN is treated in exactly the same way as np.NaN across all edge cases, I've not had any problems. fillna() works, persisting NULLs to database in place of np.NaN works, persisting NaN to csv works.

(Pandas version 18.1)



来源:https://stackoverflow.com/questions/30392720/pandas-dataframe-replace-blanks-with-nan

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!