pandas: Dataframe.replace() with regex

心已入冬 提交于 2021-02-16 19:10:33

问题


I have a table which looks like this:

df_raw = pd.DataFrame(dict(A = pd.Series(['1.00','-1']), B = pd.Series(['1.0','-45.00','-'])))

    A       B
0   1.00    1.0
1   -1      -45.00
2   NaN     -

I would like to replace '-' to '0.00' using dataframe.replace() but it struggles because of the negative values, '-1', '-45.00'.

How can I ignore the negative values and replace only '-' to '0.00' ?

my code:

df_raw = df_raw.replace(['-','\*'], ['0.00','0.00'], regex=True).astype(np.float64)

error code:

ValueError: invalid literal for float(): 0.0045.00

回答1:


Your regex is matching on all - characters:

In [48]:
df_raw.replace(['-','\*'], ['0.00','0.00'], regex=True)

Out[48]:
       A          B
0   1.00        1.0
1  0.001  0.0045.00
2    NaN       0.00

If you put additional boundaries so that it only matches that single character with a termination then it works as expected:

In [47]:
df_raw.replace(['^-$'], ['0.00'], regex=True)

Out[47]:
      A       B
0  1.00     1.0
1    -1  -45.00
2   NaN    0.00

Here ^ means start of string and $ means end of string so it will only match on that single character.

Or you can just use replace which will only match on exact matches:

In [29]:

df_raw.replace('-',0)
Out[29]:
      A       B
0  1.00     1.0
1    -1  -45.00
2   NaN       0


来源:https://stackoverflow.com/questions/32201222/pandas-dataframe-replace-with-regex

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!