Pandas - How to replace string with zero values in a DataFrame series?

问题

I'm importing some csv data into a Pandas DataFrame (in Python). One series is meant to be all numerical values. However, it also contains some spurious "$-" elements represented as strings. These have been left over from previous formatting. If I just import the series, Pandas reports it as a series of 'object'.

What's the best way to replace these "$-" strings with zeros? Or more generally, how can I replace all the strings in a series (which is predominantly numerical), with a numerical value, and convert the series to a floating point type?

Steve

回答1:

Use Series.str.replace and Series.astype

df = pd.Series(['2$-32$-4','123$-12','00123','44'])
df.str.replace(r'\$-','0').astype(float)

0    203204
1    123012
2       123
3        44
dtype: float64

回答2:

You can use the convert_objects method of the DataFrame, with convert_numeric=True to change the strings to NaNs

From the docs:

convert_numeric: If True, attempt to coerce to numbers (including strings), with unconvertible values becoming NaN.

In [17]: df
Out[17]: 
    a   b  c
0  1.  2.  4
1  sd  2.  4
2  1.  fg  5

In [18]: df2 = df.convert_objects(convert_numeric=True)

In [19]: df2
Out[19]: 
    a   b  c
0   1   2  4
1 NaN   2  4
2   1 NaN  5

Finally, if you want to convert those NaNs to 0's, you can use df.replace

In [20]: df2.replace('NaN',0)
Out[20]: 
   a  b  c
0  1  2  4
1  0  2  4
2  1  0  5

回答3:

Use .to_numeric to covert the strings to numeric (set strings to NaN using the errors option 'coerce'):

df = pd.to_numeric(df, errors='coerce')

and then convert the NaN value to zeros using replace:

df.replace('NaN',0)

来源：https://stackoverflow.com/questions/33440234/pandas-how-to-replace-string-with-zero-values-in-a-dataframe-series

标签

python

pandas

dataframe